Comparative Analysis of SDG Implementation Evolution Worldwide

Author

Lodrik Adam, Sofia Benczédi, Stefan Favre, Delia Fuchs

Published

December 4, 2023

1 Introduction

1.1 Overview and Motivation

The global significance of the SDGs is our basis. The adoption of the SDGs by the United Nation in 2015 marked a significant global commitment to address pressing issues such as poverty, inequality, climate, change, and more. The fact that these goals were unanimously adopted by 193 member states underscores their importance. This prompted us to ask ourselves, can we evaluate the progress? What has really been done so far? Although the SDGs have attracted considerable attention and backing, it is essential to evaluate the events preceding and following their implementation. Understanding the actions taken and progress made is essential in determining if these global commitments are resulting in tangible enhancements to individuals’ lives. By examining the evolution of all countries and their respective contributions towards achieving the SDGs, we can develop a comprehensive understanding of collective efforts and identify potential disparities or gaps.

1.3 Research questions

  1. Focus on factors: What can explain the state of the countries regarding sustainable development? (we will analyse different factors: scores from the human freedom index, GDP per capita, military expenditures in % of GDP/government expenditure, unemployment rate, internet usage). See data description for more precise information about the factors.

  2. Focus on time: How has the adoption of the SDGs in 2015 influenced the achievement of SDGs? (we want to compare the achievement (SDG scores: there are scores calculated even before the adoption) of the different countries before and after 2015 to see if the adoption of SDG gave a real “push” to sustainable development)

  3. Focus on events: Is the evolution in sustainable development influenced by uncontrollable events, such as economic crisis, health crises and natural disasters? (we will analyse the impact of the COVID, natural disasters and conflicts (# deaths, damages, etc.) on the SDG scores). See data description for more precise information about how the impact of these events are materialized into data.

  4. Focus on relationship between SDGs: How are the different SDGs linked? (We want to see if some SDGs are linked in the fact that a high score on one implies a high score on the other, and thus if we can make groups of SDGs that are comparable in that way).

2 Data

2.1 Sources

We are collecting our Data from the sustainability development report (SDG), the international labour organization (ILOSTAT), the World Bank, Our world in data, the CATO institute, one from Kaggle (disasters: we couldn’t find relevant accessible information from somewhere else) and GitHub. We found different datasets containing useful information in relation with the SDGs. The details about these data and the links are presented in the next section. Utilizing the kableExtra package, we provide a comprehensive list and corresponding links to our sources, as outlined below:

Name of the Table Source
D1_1_SDG dashboards.sdgindex.org
D2_2_Unemployment_rate ilo.org
D3_0_GDP_per_capita data.worldbank.org
D3_1_Military_expenditure_percent_GDP data.worldbank.org
D3_2_Military_expenditure_percent_gov_exp data.worldbank.org
D4_0_Internet_usage ourworldindata.org
D5_0_Human_freedom_index cato.org
D6_0_Disaters kaggle.com
D7_0_COVID github.com
D8_0_Conflicts datacatalog.worldbank.org

2.2 Description

During the wrangling process, we added data to our table (D1_1_SDG) from different other datasets and match them based on the country code, and the year. The tables below show all the variables present in our 9 databases. We will then merge them to have our final table for the analysis.

D1_1_SDG

Our primary database focuses on the Sustainable Development Goals (SDG). Below is a table summarizing the key variables included:

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
overallscore Overall score on all 17 SDGs (the score are % of achievement of the goals determined by the UN based on several indicators)
goal1:goal17 Score on each SDG except SDG 14 (16 variables)
population Population of the country

D2_2_Unemployment_rate

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
unemployment.rate Unemployment rate (% of the population 15 years old and older)

D3_0_GDP_per_capita

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
GDPpercapita GDP per capita

D3_1_Military_expenditure_percent_GDP

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
MilitaryExpenditurePercentGDP Military expenditures in percentage of GDP

D4_0_Internet_usage

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
internet.usage Internet usage (% of the population)

D5_0_Human_freedom_index

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
region Part of the world, group of countries (e.g. Eastern Europe, Dub-Saharan Africa, South Asia, etc.)
pf_law Rule of law, mean score of: Procedural justice, Civil, justice, Criminal justice, Rule of law (V-Dem)
pf_security Security and safety, mean score of: Homicide, Disappearances conflicts, terrorism
pf_movement Freedom of movement (V-Dem), Freedom of movement (CLD)
pf_religion Freedom of religion, Religious organization, repression
pf_assembly Civil society entry and exit, Freedom of assembly, Freedom to form/run political parties, Civil society repression
pf_expression Direct attacks on the press, Media and expression (V-Dem), Media and expression (Freedom House), Media and expression (BTI), Media and expression (CLD)
pf_identity Same-sex relationships, Divorce, Inheritance rights, Female genital mutilation
ef_gouvernment Government consumption, Transfers and subsidies, Government investment, Top marginal tax rate, State ownership of assets
ef_legal Judicial independence, Impartial courts, Protection of property rights, Military interference Integrity of the legal system Legal enforcementof contracts, Regulatory costs, Reliability of police
ef_money Money growth, Standard deviation of inflation, Inflation: Most recent year, Freedom to own foreign currency
ef_trade Tariffs, Regulatory trade barriers, Black-market exchange rates, Movement of capital and people
ef_regulation Credit market regulations, Labor market regulations, Business regulations

D6_0_Disaters

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
continent Continents touched by the disasters such as floods, ouragan
total_deaths Number of deaths caused by disasters
no_injured Number of injured caused by disasters
no_affected Number of affected caused by disasters
no_homeless Number of homeless caused by disasters
total_affected Total number of affected caused by disasters
total_damages Total of infrastructure damages

D7_0_COVID

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
deaths_per_million Number of people dead due to COVID
cases_per_million Number of COVID cases
stringency Government Response Stringency Index: composite measure based on 9 response indicators including school closures, workplace closures, and trave

D8_0_Conflicts

Variable Name Explanation
code Country code (ISO)
country Country name
year Year of the observation (2000-2022)
ongoing Variable coded 1 for more than 25 deaths in intrastate conflict and 0 otherwise according to UCDP/PRIO Armed Conflict Dataset 17.1.
sum_deaths Best estimate of deaths in all categories of violence (non-state, one-sided and state-based) recorded by the Uppsala Conflict Data Program in the country based on the UCDP GED dataset (unpublished 2016 data). The location of these events is used for estimating the extent of violence.
pop_affected Share of population affected by violence in percentage (0 to 100) measured as described above based on population data from CIESIN, the PRIO-GRID structure as well as UCDP GED.
area_affected Area affected by conflict
maxintensity Two different intensity levels are coded: minor armed conflicts (1) and wars (2), Takes the max intensity of conflict in the country so that it is coded 2 if there is at least one war (>=1000 deaths in intrastate conflict) during the year. Data from UCDP/PRIO Armed Conflict Dataset 17.1.

2.3 Wrangling/cleaning

To accommodate the large scale of the datasets, we pre-cleaned each one prior to merging. This streamlined the process, simplifying the cleaning of the final, combined dataset.

2.3.1 Dataset on SDG

This is our main dataset, that we clean in order to keep the columns containing the following information: country name, country code, year, population, overall score and the SDGs scores.

We start by importing the data and converting it into a DataFrame. Next, we rename the columns and convert the scores into numeric variables.

Code
D1_0_SDG <- read.csv(here("scripts","data","SDG.csv"), sep = ";")
D1_0_SDG <- as.data.frame(D1_0_SDG)

D1_0_SDG <- D1_0_SDG[,1:22]

colnames(D1_0_SDG) <- c("code", "country", "year", "population", "overallscore", "goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal14", "goal15", "goal16", "goal17")

D1_0_SDG[["overallscore"]] <- as.double(gsub(",", ".", D1_0_SDG[["overallscore"]]))

makenumSDG <- function(D1_0_SDG) {
  for (i in 1:17) {
    varname <- paste("goal", i, sep = "")
    D1_0_SDG[[varname]] <- as.double(gsub(",", ".", D1_0_SDG[[varname]]))
  }
  return(D1_0_SDG)
}

D1_0_SDG <- makenumSDG(D1_0_SDG)

We proceed by examining the missing values.

Code
propmissing <- numeric(length(D1_0_SDG))

for (i in 1:length(D1_0_SDG)){
  proportion <- mean(is.na(D1_0_SDG[[i]]))
  propmissing[i] <- proportion
}
variable_names <- colnames(D1_0_SDG)
 
prop_missing_data <- data.frame(variable = variable_names, prop_missing = propmissing)

ggplot(prop_missing_data, aes(x = variable, y = prop_missing)) +
   geom_bar(stat = "identity", fill = "skyblue", color = "black") +
   labs(title = "Proportion of Missing Values by Variable",
        x = "Variable",
        y = "Proportion of Missing Values") +
   theme_minimal()+
   coord_flip()

Observing that the ‘population’ column contains numerous NAs, we investigate and discover that missing values are common, as some observations represent regions, not countries. Therefore, we can safely exclude these observations.

Code
SDG0 <- D1_0_SDG |> 
  group_by(code) |> 
  select(population) |> 
  summarize(NaPop = mean(is.na(population))) |>
  filter(NaPop != 0)

ggplot(SDG0, aes(x = code, y = NaPop)) +
  geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
  labs(title = "Proportion of Missing Values in 'population' by 'code'",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

D1_0_SDG <- D1_0_SDG %>%
  filter(!str_detect(code, "^_"))

Now, there are no missing values in the ‘population’ variable, and we observe that it contains information on 166 countries.

We notice that NAs are present in only three SDG scores: 1, 10, and 14. Additionally, when a country has NAs, they occur across all years or not at all. Consequently, we decide to conduct further investigations on these three SDG scores to determine whether to include them in our analysis.

Code
SDG1 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na1 = mean(is.na(goal1)),
            Na2 = mean(is.na(goal2)),
            Na3 = mean(is.na(goal3)),
            Na4 = mean(is.na(goal4)),
            Na5 = mean(is.na(goal5)),
            Na6 = mean(is.na(goal6)),
            Na7 = mean(is.na(goal7)),
            Na8 = mean(is.na(goal8)),
            Na9 = mean(is.na(goal9)),
            Na10 = mean(is.na(goal10)),
            Na11 = mean(is.na(goal11)),
            Na12 = mean(is.na(goal12)),
            Na13 = mean(is.na(goal13)),
            Na14 = mean(is.na(goal14)),
            Na15 = mean(is.na(goal15)),
            Na16 = mean(is.na(goal16)),
            Na17 = mean(is.na(goal17))) |>
  filter(Na1 != 0 | Na2 != 0 | Na3 != 0| Na4 != 0| Na5 != 0| Na6 != 0| Na7 != 0| Na8 != 0| Na9 != 0| Na10 != 0| Na11 != 0| Na12 != 0| Na13 != 0| Na14 != 0| Na15 != 0| Na16 != 0| Na17 != 0)

result_list <- list()
for (col in names(SDG1)[-1]) {
  count_na <- sum(SDG1[[col]] != 0)
    temp_df <- data.frame(Goal = col, Count_NA = count_na, stringsAsFactors = FALSE)
    result_list <- c(result_list, list(temp_df))
}
result_df <- do.call(rbind, result_list)

ggplot(result_df, aes(x = reorder(Goal, Count_NA), y = Count_NA)) +
  geom_bar(stat = "identity", fill = "lightyellow", color = "black") +
  labs(title = "Count of Missing Values by Goal",
       x = "Goal",
       y = "Count of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

For goal 1, there are only 9.04% missing values in 15 different countries. Goal 1 being “End poverty”, we decide to keep it and only remove the countries with no information for the analysis.

Code
SDG2 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na1 = mean(is.na(goal1))) |>
  filter(Na1 != 0)
country_number <- length(unique(D1_0_SDG$country))
length(unique(SDG2$code))/country_number
#> [1] 0.0904

For goal 10, there are only 10.2% missing values in 17 different countries. Goal 10 being “reduced inequalities”, we decide to keep it and only remove the countries with no information for the analysis.

Code
SDG3 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na10 = mean(is.na(goal10))) |>
  filter(Na10 != 0)

length(unique(SDG3$code))/country_number
#> [1] 0.102

For goal 14, there are 24.1% missing values in 40 different countries. Goal 14 being “life under water”, we decide not to keep it, because other SDG such as “life on earth” and “clean water” already treat similar subjects.

Code
SDG4 <- D1_0_SDG |> 
  group_by(code) |> 
  select(contains("goal")) |> 
  summarize(Na14 = mean(is.na(goal14))) |>
  filter(Na14 != 0)

length(unique(SDG4$code))/country_number
#> [1] 0.241

D1_0_SDG <- D1_0_SDG %>% select(-goal14)

We will work with various datasets and merge them using the country code and year as key identifiers. To ensure accurate matching, we first verify that country names are encoded in UTF-8 format. Then, we standardize the names of the countries (requiring a custom match for Turkey) and the country codes, utilizing the countrycode library. Additionally, we compile a list of all country codes from the main database to filter the other datasets. Lastly, we complete the database to include all possible “country, year” combinations, ensuring the total number of rows remains unchanged.

Code
D1_0_SDG$country <- stri_encode(D1_0_SDG$country, to = "UTF-8")

D1_0_SDG <- D1_0_SDG %>%
  mutate(country = countrycode(country, "country.name", "country.name", custom_match = c("T�rkiye"="Turkey")))

D1_0_SDG$code <- countrycode(
  sourcevar = D1_0_SDG$code,
  origin = "iso3c",
  destination = "iso3c",
)

list_country <- c(unique(D1_0_SDG$code))

D1_0_SDG_country_list <- D1_0_SDG %>%
  filter(code %in% list_country) %>%
  select(code, country)

D1_0_SDG_country_list <- D1_0_SDG_country_list %>%
  select(code, country) %>%
  distinct()

Finally, we complete the database to ensure there are no missing pairs of (year, code).

Here are the first few lines of the cleaned dataset on SDG achievement scores:

For this first dataset, we reduced the size from 4,140 observations across 120 variables to 3,818 observations for 21 variables.

As said, this is now our main dataset. All subsequent datasets will be merged with this dataset. Therefore, for all the following datasets, we will make sure that we only keep data for the same countries and years as in this dataset. We have a total of 166 countries and the years range from 2000 to 2022.

2.3.2 Dataset on Unemployment rate

In this dataset, the initial step involves importing the data. Next, we ensure that the names and codes of the countries are formatted in UTF-8, preventing any discrepancies due to mismatches in country names. Following this, we modify the column names and filter the data to include only the relevant countries and years, specifically the years 2000 to 2022, covering 166 countries from our primary dataset.

Code
D2_1_Unemployment_rate <- read.csv(here("scripts","data","UnemploymentRate.csv")) %>%
  as.data.frame() %>%
  mutate(
    country = iconv(ref_area.label, to = "UTF-8", sub = "byte"),
    country = countrycode(country, "country.name", "country.name"),
    year = time,
    `unemployment rate` = obs_value / 100,
    age_category = classif1.label,
    sex = sex.label
  ) %>%
  select(-ref_area.label, -time, -obs_value, -classif1.label, -sex.label, -source.label, -obs_status.label, -indicator.label) %>%
  merge(D1_0_SDG_country_list[, c("country", "code")], by = "country", all.x = TRUE) %>%
  filter(year >= 2000 & year <= 2022,
         !str_detect(sex, fixed("Male")) & !str_detect(sex, fixed("Female")),
         code %in% D1_0_SDG_country_list$code,
         age_category == "Age (Youth, adults): 15+") %>%
  select(code, country, year, `unemployment rate`) %>%
  distinct()

Here are the first few lines of the cleaned dataset on Unemployment rate:

For this first dataset, we reduced the size from 82,800 observations across 8 variables to 3,812 observations for 5 variables.

2.3.3 Dataset on GDP military Expenditures

We have three different databases which contain information on each countries over the years. Each year represent one variable. We want to extract three variables for our analysis: GDP per capita, military expenditures in percentage of the GDP and military expenditures in percentage of government expenditures.

Code
GDPpercapita <-
  read.csv(here("scripts","data","GDPpercapita.csv"), sep = ";")
MilitaryExpenditurePercentGDP <-
  read.csv(here("scripts","data","MilitaryExpenditurePercentGDP.csv"), sep = ";")
MiliratyExpenditurePercentGovExp <-
  read.csv(here("scripts","data","MiliratyExpenditurePercentGovExp.csv"), sep = ";")

After importing the data, we fill in the missing country codes using the column Indicator.Name, because we realized after some manipulations, that some of the country codes were false, but the next column contained the right ones.

Code
fill_code <- function(data){
  data <- data %>%
    mutate(Country.Code = ifelse(!grepl("^[A-Z]{3}$", Country.Code), Indicator.Name, Country.Code))
}

We create a set of functions that we will apply to each database. First, remove the variables that we don’t need, which are the years before 2000. Second, make sure that the values are numeric and rename the year variables (because they all had an “X” before year number). Third, transform the database from wide to long, in order to match the main database. Fourth, transform the year variable into an integer variable and rearrange and rename the columns to match the ones of the other databases. Then, we apply these transformations to the three databases.

Code
remove <- function(data){
  years <- seq(1960, 1999)
  removeyears <- paste("X", years, sep = "")
  data <- data[, !(names(data) %in% c("Indicator.Name", "Indicator.Code", "X", removeyears))]
}

makenum <- function(data) {
  for (i in 2000:2022) {
    year <- paste("X", i, sep = "")
    data[[year]] <- as.numeric(data[[year]])
  }
  return(data)
}

renameyear <- function(data) {
  for (i in 2000:2022) {
    varname <- paste("X", i, sep = "")
    names(data)[names(data) == varname] <- gsub("X", "", varname)
  }
  return(data)
}

wide2long <- function(data) {
  data <- pivot_longer(data, 
                       cols = -c("Country.Name", "Country.Code"), 
                       names_to = "year", 
                       values_to = "data")
  return(data)
}

yearint <- function(data) {
  data$year <- as.integer(data$year)
  return(data)
}

nameorder <- function(data) {
  colnames(data) <- c("country", "code", "year", "data")
  data <- data %>% select(c("code", "country", "year", "data"))
}

cleanwide2long <- function(data){
  data <- fill_code(data)
  data <- remove(data)
  data <- makenum(data)
  data <- renameyear(data)
  data <- wide2long(data)
  data <- yearint(data)
  data <- nameorder(data)
}

GDPpercapita <- cleanwide2long(GDPpercapita)
MilitaryExpenditurePercentGDP <- cleanwide2long(MilitaryExpenditurePercentGDP)
MiliratyExpenditurePercentGovExp <- cleanwide2long(MiliratyExpenditurePercentGovExp)

We rename the colums with the main information, standardize the country code and remove the countries that are not in our main database. We see that all the 166 countries are there.

Code
GDPpercapita <- GDPpercapita %>%
  rename(GDPpercapita = data)
MilitaryExpenditurePercentGDP <- MilitaryExpenditurePercentGDP %>%
  rename(MilitaryExpenditurePercentGDP = data)
MiliratyExpenditurePercentGovExp <- MiliratyExpenditurePercentGovExp %>%
  rename(MiliratyExpenditurePercentGovExp = data)

GDPpercapita$code <- countrycode(
  sourcevar = GDPpercapita$code,
  origin = "iso3c",
  destination = "iso3c",
)

MilitaryExpenditurePercentGDP$code <- countrycode(
  sourcevar = MilitaryExpenditurePercentGDP$code,
  origin = "iso3c",
  destination = "iso3c",
)

MiliratyExpenditurePercentGovExp$code <- countrycode(
  sourcevar = MiliratyExpenditurePercentGovExp$code,
  origin = "iso3c",
  destination = "iso3c",
)

GDPpercapita <- GDPpercapita %>% filter(code %in% list_country)
length(unique(GDPpercapita$code))
#> [1] 166

MilitaryExpenditurePercentGDP <- MilitaryExpenditurePercentGDP %>% filter(code %in% list_country)
length(unique(MilitaryExpenditurePercentGDP$code))
#> [1] 166

MiliratyExpenditurePercentGovExp <- MiliratyExpenditurePercentGovExp %>% filter(code %in% list_country)
length(unique(MiliratyExpenditurePercentGovExp$code))
#> [1] 166

There were only 157 countries that were both in the main SDG dataset and in these 3 datasets, but we suspected that some of the missing countries were in the database but not rightly matched. Indeed, Bahamas was in the database but instead of the code “BHS” there was “The”, for “COD” it was “Dem. Rep.”, for “COG” it was “Rep”, etc. We remarked that the code is in another column of the initial database: “Indicator.Name”. We went back to the initial database and before cleaning it we put the right codes (as seen above) and after rerunning the code we see that we have all our 166 countries from the initial dataset.

Code
list_country_GDP <- c(unique(GDPpercapita$code))
(missing <- setdiff(list_country, list_country_GDP))
#> character(0)

We run a first round of investigation of the missing values and find that we have 16.4% for MiliratyExpenditurePercentGovExp, 12.9% for MilitaryExpenditurePercentGDP and 1.31% for GDPpercapita.

Code
mean(is.na(MiliratyExpenditurePercentGovExp$MiliratyExpenditurePercentGovExp))
#> [1] 0.164
mean(is.na(MilitaryExpenditurePercentGDP$MilitaryExpenditurePercentGDP))
#> [1] 0.129
mean(is.na(GDPpercapita$GDPpercapita))
#> [1] 0.0131

2.3.3.1 GDP per capita

For GDPpercapita, only two countries (SOM and SSD) have a lot of missing values and in total 11 countries countries have missing values.

Code
GDPpercapita1 <- GDPpercapita %>%
  group_by(code) %>%
  summarize(NaGDP = mean(is.na(GDPpercapita))) %>%
  filter(NaGDP != 0)

ggplot(GDPpercapita1, aes(x = reorder(code, NaGDP), y = NaGDP, fill = code)) +
  geom_bar(stat = "identity", fill="#FFEDCC", color="black") +
  labs(title = "Proportion of Missing Values in 'GDPpercapita' by 'code'",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

We plot the evolution of GDPpercapita avec the years for each country containing missing values and distinguish the percentage of missing values with colors.

Code
filtered_data_GDP <- GDPpercapita %>%
  filter(code %in% GDPpercapita1$code) # countries with NAs

filtered_data_GDP <- filtered_data_GDP %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(GDPpercapita))) %>% # column % NAs
  ungroup()

Evol_Missing_GDP <- ggplot(data = filtered_data_GDP) +
  geom_point(aes(x = year, y = GDPpercapita, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 1),
                             labels = c("0-10%", "10-20%", "30-100%")))) +
  labs(title = "Evolution of GDP per capita over time", x = "Year", y = "GDP per capita") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "30-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 4)

print(Evol_Missing_GDP)

For the countries with less than 30% of missing values and a linear evolution in time, we fill the missing values using linear interpolation.

Code
list_code <- c("AFG", "BTN", "CUB", "STP", "TKM")

for (i in list_code) {
  country_data <- GDPpercapita %>% filter(code == i)
  interpolated_data <- na.interp(country_data$GDPpercapita)
  GDPpercapita[GDPpercapita$code == i, "GDPpercapita"] <- interpolated_data
}

2.3.3.2 Military expenditures in percentage of GDP

For MilitaryExpenditurePercentGDP, 12 countries have 100% of missing values. We further investigate and keep them for now, knowing that some of these coutries may also have many missing values in the other databases when wee merge everything and will be dropped later.

Code
MilitaryExpenditurePercentGDP1 <- MilitaryExpenditurePercentGDP %>%
  group_by(code) %>%
  summarize(NaMil1 = round(mean(is.na(MilitaryExpenditurePercentGDP)),3)) %>%
  filter(NaMil1 != 0)

ggplot(MilitaryExpenditurePercentGDP1, aes(x = reorder(code, NaMil1), y = NaMil1, fill = code)) +
  geom_bar(stat = "identity", fill="#FFC0CB", color="black") +
  labs(title = "Proportion of Missing Values in 'MilitaryExpenditurePercentGDP' by 'code'",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

We plot the evolution of MilitaryExpenditurePercentGDP along the years for each country containing missing values and distinguish the percentage of missing values with colors.

Code
filtered_data_Mil1 <- MilitaryExpenditurePercentGDP %>%
  filter(code %in% MilitaryExpenditurePercentGDP1$code) # countries with NAs

filtered_data_Mil1 <- filtered_data_Mil1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(MilitaryExpenditurePercentGDP))) %>% # Column % NAs
  ungroup()

Evol_Missing_Mil1 <- ggplot(data = filtered_data_Mil1) +
  geom_line(aes(x = year, y = MilitaryExpenditurePercentGDP, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Military expenditure in % of GDP over time", x = "Years from 2000 to 2022", y = "GDP per capita") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 6) +
  theme(strip.text = element_text(size = 6)) +
  scale_x_continuous(breaks = NULL) +
  scale_y_continuous(breaks = NULL)

print(Evol_Missing_Mil1)

For the countries with less than 30% of missing values and a linear evolution in time, we fill the missing values using linear interpolation.

Code
list_code <- c("AFG", "BDI", "BEN", "CAF", "CIV", "COD", "GAB", "GMB", "KAZ", "LBN", "LBR", "MNE", "MRT", "NER", "TKJ", "TTO", "ZMB")

for (i in list_code) {
  country_data <- MilitaryExpenditurePercentGDP %>% filter(code == i)
  interpolated_data <- na.interp(country_data$MilitaryExpenditurePercentGDP)
  MilitaryExpenditurePercentGDP[MilitaryExpenditurePercentGDP$code == i, "MilitaryExpenditurePercentGDP"] <- interpolated_data
}

2.3.3.3 Military expenditures in percentage of governement expenditures

For MilitaryExpenditurePercentGovExp, 17 countries have 100% of missing values. We further investigate and keep them for now, knowing that some of these coutries may also have many missing values in the other databases when wee merge everything and will be dropped later.

Code
MiliratyExpenditurePercentGovExp1 <- MiliratyExpenditurePercentGovExp %>%
  group_by(code) %>%
  summarize(NaMil2 = round(mean(is.na(MiliratyExpenditurePercentGovExp)),3)) %>%
  filter(NaMil2 != 0)

ggplot(MiliratyExpenditurePercentGovExp1, aes(x = reorder(code, NaMil2), y = NaMil2, fill = code)) +
  geom_bar(stat = "identity", fill="#E6E6FA", color="black") +
  labs(title = "Proportion of Missing Values in 'MilitaryExpenditurePercentGDP' by 'code'",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size=8)) 

We plot the evolution of MilitaryExpenditurePercentGovExp along the years for each country containing missing values and distinguish the percentage of missing values with colors.

Code
filtered_data_Mil2 <- MiliratyExpenditurePercentGovExp %>%
  filter(code %in% MiliratyExpenditurePercentGovExp1$code) # Countries with NAs

filtered_data_Mil2 <- filtered_data_Mil2 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(MiliratyExpenditurePercentGovExp))) %>% # Column % NAs
  ungroup()

Evol_Missing_Mil2 <- ggplot(data = filtered_data_Mil2) +
  geom_line(aes(x = year, y = MiliratyExpenditurePercentGovExp, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Military expenditure in % of government expenditures over time", x = "Year from 2000 to 2022", y = "GDP per capita") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 7) +
  theme(strip.text = element_text(size = 6)) +
  scale_x_continuous(breaks = NULL) +
  scale_y_continuous(breaks = NULL)

print(Evol_Missing_Mil2)

For the countries with less than 30% of missing values and a linear evolution in time, we fill the missing values using linear interpolation.

Code
list_code <- c("AFG", "ARM", "BEN", "BIH", "BLR", "COG", "ECU", "GAB", "GMB", "KAZ", "LBN", "LBR", "MNE", "MWI", "NER", "TTO", "UKR", "ZMB")

for (i in list_code) {
  country_data <- MiliratyExpenditurePercentGovExp %>% filter(code == i)
  interpolated_data <- na.interp(country_data$MiliratyExpenditurePercentGovExp)
  MiliratyExpenditurePercentGovExp[MiliratyExpenditurePercentGovExp$code == i, "MiliratyExpenditurePercentGovExp"] <- interpolated_data
}

We now look again at the percentage of missing values for the trhee databases: 14.49% for MiliratyExpenditurePercentGovExp, 11.6% for MilitaryExpenditurePercentGDP and 1.07% for GDPpercapita

Code
mean(is.na(MiliratyExpenditurePercentGovExp$MiliratyExpenditurePercentGovExp))
#> [1] 0.149
mean(is.na(MilitaryExpenditurePercentGDP$MilitaryExpenditurePercentGDP))
#> [1] 0.116
mean(is.na(GDPpercapita$GDPpercapita))
#> [1] 0.0107

D3_1_GDP_per_capita <- GDPpercapita
D3_2_Military_Expenditure_Percent_GDP <- MilitaryExpenditurePercentGDP
D3_3_Miliraty_Expenditure_Percent_Gov_Exp <- MiliratyExpenditurePercentGovExp

Here are the first few lines of the cleaned dataset of GDP per capita:

For this dataset, we went from ??? observations for 68 variables to 3818 observations for 4 varibles.

Here are the first few lines of the cleaned dataset of military expenditures in percentage of GDP:

For this dataset, we went from ??? observations for 68 variables to 3818 observations for 4 varibles.

Here are the first few lines of the cleaned dataset of military expenditures in percentage of government expenditures:

2.3.4 Dataset on internet usage

To prepare the dataset on internet usage in the world to be merge with the other data, we first, import the data. Then, we keep only the year that we are interested in (2000 to 2022). We also rename the column and keep only the country that match the list of the countries in the main dataset on the SDG.

Code
D4_0_Internet_usage <- read.csv(here("scripts", "data", "InternetUsage.csv")) %>%
  filter(Year >= 2000, Year <= 2022) %>%
  rename(
    code = Code,
    country = Entity,
    year = Year,
    internet_usage = Individuals.using.the.Internet....of.population.
  ) %>%
  mutate(internet_usage = internet_usage / 100) %>%
  filter(code %in% list_country) %>%
  select(code, country, year, internet_usage)

Here are the first few lines of the cleaned dataset of internet usage:

For this first dataset, we reduced the size from 6,570 observations across 4 variables to 3,433 observations for 4 variables.

2.3.5 Dataset on human freedom index

After importing the data from the CATO Institute website, we noticed that even if the file was called “Human Freedom Index 2022”, the available observations were only going from 2000 up to 2020. We have decided first to modify it in order to match our other datasets, by renaming/encoding/standardizing the columns containing the country names.

Code
data <- read.csv(here("scripts", "data", "human-freedom-index-2022.csv"))

#data in tibble 
datatibble <- tibble(data)

# Rename the column countries into country to match the other datbases
names(datatibble)[names(datatibble) == "countries"] <- "country"

# Make sure the encoding of the country names are UTF-8
datatibble$country <- iconv(datatibble$country, to = "UTF-8", sub = "byte")

# standardize country names
datatibble <- datatibble %>%
  mutate(country = countrycode(country, "country.name", "country.name"))

Once done, we could verify which countries were or were not present between these observations and our main SDG dataset. We have decided to keep the ones that were matching between the two datasets.

Code
# Merge by country name
datatibble <- datatibble %>%
  left_join(D1_0_SDG_country_list, by = "country")

datatibble <- datatibble %>% filter(code %in% list_country)
(length(unique(datatibble$code)))
#> [1] 159

# See which ones are missing
list_country_free <- c(unique(datatibble$code))
(missing <- setdiff(list_country, list_country_free))
#> [1] "AFG" "CUB" "MDV" "STP" "SSD" "TKM" "UZB"

# Turkey was missing but present in the initial database (it was a problem when stadardizing the country names of D1_0SDG_country_list that we corrected) and the other missing countries are:"AFG" "CUB" "MDV" "STP" "SSD" "TKM" "UZB" 
D5_0_Human_freedom_index <- datatibble

Then, we noticed that there were a lot of columns that were not important for us, as we had 141 variables taken into account. So we have decided to keep the ones that refers to the countries informations (such as code, year, ..) and their human freedom scores per category (pf for personnal freedom, ef for economical freedom).

Code
# erasing useless columns to keep only the general ones. 
D5_0_Human_freedom_index <- select(D5_0_Human_freedom_index, year, country, region, hf_score, pf_rol, pf_ss, pf_movement, pf_religion, pf_assembly, pf_expression, pf_identity, pf_score, ef_government, ef_legal, ef_money, ef_trade, ef_regulation, ef_score, code)

D5_0_Human_freedom_index <- D5_0_Human_freedom_index %>%
  rename(
    pf_law = names(D5_0_Human_freedom_index)[5],      # Renames the 5th column to "pf_law"
    pf_security = names(D5_0_Human_freedom_index)[6]  # Renames the 6th column to "pf_security"
  )

After renaming the columns pf_law/security for comprehension purpose, we have investigated how are distributed the NA values among the countries and the variables. After having found the percentages of missing values per country and variable, heatmaps revealed themself to be a great tool for visualizing datas.

Code
na_percentage_by_country <- D5_0_Human_freedom_index %>%
  group_by(country) %>%
  select(-code) %>%
  summarise(across(everything(), ~mean(is.na(.))*100))

na_long <- na_percentage_by_country %>%
  pivot_longer(
    cols = -country,
    names_to = "Variable",
    values_to = "NA_Percentage"
  )

overall_na_percentage <- na_long %>%
  group_by(Variable) %>%
  summarize(Avg_NA_Percentage = mean(NA_Percentage, na.rm = TRUE)) %>%
  arrange(desc(Avg_NA_Percentage))
print(overall_na_percentage)
#> # A tibble: 17 x 2
#>    Variable      Avg_NA_Percentage
#>    <chr>                     <dbl>
#>  1 ef_money                 10.4  
#>  2 ef_trade                 10.4  
#>  3 ef_score                 10.4  
#>  4 hf_score                 10.4  
#>  5 pf_score                 10.4  
#>  6 ef_regulation             9.49 
#>  7 ef_government             2.91 
#>  8 ef_legal                  1.71 
#>  9 pf_law                    1.44 
#> 10 pf_identity               0.299
#> 11 pf_assembly               0    
#> 12 pf_expression             0    
#> 13 pf_movement               0    
#> 14 pf_religion               0    
#> 15 pf_security               0    
#> 16 region                    0    
#> 17 year                      0

Then, for having a better understanding of the situation, we ordered the countries having at least 1 variable containing 50% and more of missing values

Code
na_long <- na_long %>%
  group_by(country) %>%
  mutate(Count_NA_50_100 = sum(NA_Percentage >= 50 & NA_Percentage <= 100, na.rm = TRUE)) %>%
  ungroup() %>%
  arrange(desc(Count_NA_50_100))

heatmap_ordered_all <- ggplot(na_long, aes(x = reorder(country, -Count_NA_50_100), y = Variable)) +
  geom_tile(aes(fill = NA_Percentage), colour = "white") +
  scale_fill_gradient(low = "white", high = "red") +
  theme_minimal() +
  labs(
    title = "Heatmap of NA Percentages per Country and Variable",
    x = "Countries",
    y = "Variables",
    fill = "NA Percentage"
  ) +
  theme(
    axis.text.x = element_blank(),  # Hide x-axis labels
    axis.text.y = element_text(size = 9)
  )
print(heatmap_ordered_all)

We notice that only some countries look to contain at least 50% of missing values and in addition that most of the missing values are concerning the EF variables (Economic Freedom). Now, we tried to produce another heatmap only containing the ordered countries, and also counting for each one of these country the number of variables with at least 50% of NAs.

Code
na_long_filtered <- na_long %>%
  group_by(country) %>%
  mutate(Count_NA_50_100 = sum(NA_Percentage >= 50 & NA_Percentage <= 100, na.rm = TRUE)) %>%
  filter(Count_NA_50_100 > 0) %>%
  ungroup() %>%
  arrange(desc(Count_NA_50_100))

heatmap_ordered_filtered <- ggplot(na_long_filtered, aes(x = reorder(country, -Count_NA_50_100), y = Variable)) +
  geom_tile(aes(fill = NA_Percentage), colour = "white") +
  scale_fill_gradient(low = "white", high = "red") +
  theme_minimal() +
  labs(
    title = "Heatmap of NA Percentages per Country and Variable",
    x = "Countries",
    y = "Variables",
    fill = "NA Percentage"
  ) +
  theme(
    axis.text.x = element_text(angle = 90, hjust = 1),
    axis.text.y = element_text(size = 7)
  )
print(heatmap_ordered_filtered)

country_na_count <- na_long %>%
  filter(NA_Percentage >= 50) %>%
  group_by(country) %>%
  summarise(Count_NA_50_100 = n()) %>%
  arrange(desc(Count_NA_50_100))
print(country_na_count)
#> # A tibble: 13 x 2
#>    country  Count_NA_50_100
#>    <chr>              <int>
#>  1 Comoros                8
#>  2 Djibouti               8
#>  3 Somalia                8
#>  4 Belarus                6
#>  5 Guinea                 6
#>  6 Iraq                   6
#>  7 Laos                   6
#>  8 Sudan                  6
#>  9 Bhutan                 5
#> 10 Liberia                5
#> 11 Bahamas                1
#> 12 Belize                 1
#> 13 Brunei                 1

We conclude here that 13 countries were concerned by our selection of 50% and more of missing values. When discussing between us, we came to the conclusion that among these 13 countries, a great part of them were not going to be selected because they had a lot of missing values in our main dataset too. Therefore, we have decided to merge this data with the other datasets and finish the cleaning after.

Here are the first few lines of the partialy cleaned dataset on Human Freedom Index scores:

For this first dataset, we reduced the size from 3’465 observations across 141 variables to 3339 observations for 4 variables.

2.3.6 Dataset on Disasters

For this dataset concerning the Disasters we imported the data from Kaggle as we couldn’t find the original dataset that is private coming from the EOSDIS SYSTEM, an interactive interface for browsing full-resolution, global, daily satellite images from NASA. Once we made sure that our file called “Disasters” was convert into a data frame, we selected some specific columns that we where interested in.

Code
Disasters <- as.data.frame(read.csv(here("scripts", "data", "Disasters.csv"))) %>%
  select(Year, Country, ISO, Location, Continent, Disaster.Subgroup, Disaster.Type, Total.Deaths, No.Injured, No.Affected, No.Homeless, Total.Affected, Total.Damages...000.US..)

Because we knew that our file showed all the disasters in each country over the years (between 1970-2021) and that we wanted to focus on a specific period, we filtered our data to show the years between 2000 and 2022. Then we rearranged our data, changing the data types of all the columns and their names in order to match our other datasets.

Code
# Rearrange the columns, changed the type of data, renamed the columns
Rearanged_Disasters <- Disasters %>%
  filter(Year >= 2000 & Year <= 2022) %>%
  mutate(
    code = as.character(ISO),
    country = as.character(Country),
    year = as.integer(Year),
    continent = as.character(Continent),
    disaster.subgroup = as.character(Disaster.Subgroup),
    disaster.type = as.character(Disaster.Type),
    location = as.character(Location),
    total.deaths = as.numeric(Total.Deaths),
    no.injured = as.numeric(No.Injured),
    no.affected = as.numeric(No.Affected),
    no.homeless = as.numeric(No.Homeless),
    total.affected = as.numeric(Total.Affected),
    total.damages = as.numeric(Total.Damages...000.US..)
  )

We then grouped the data by “year”, “code”, “country” and “continent” and summarize the data. Here you can see that we re-selected specific columns as we saw that our first pre-selection was still too wide and some variables as the disaster.subgroup and disaster.type weren’t pertinent.We arranged the columns based on “code,” “country,” “year,” and “continent” to match the other datasets.

Code
Disasters <- Rearanged_Disasters %>%
  group_by(year,code, country, continent) %>%
  summarize(
    total_deaths = sum(total.deaths, na.rm = TRUE),
    no_injured = sum(no.injured, na.rm = TRUE),
    no_affected = sum(no.affected, na.rm = TRUE),
    no_homeless = sum(no.homeless, na.rm = TRUE),
    total_affected = sum(total.affected, na.rm = TRUE),
    total_damages = sum(total.damages, na.rm = TRUE)
  ) 

D6_0_Disasters <- Disasters %>%
  select(code, country, year, continent, total_deaths, no_injured, no_affected, no_homeless, total_affected, total_damages) %>%
  arrange(code, country, year, continent)

Finally we filtered our disasters data to keep only the countries that are present in our main dataset. We analysed the missing countries and identified three countries (BHR, BRN, MLT) that are unexpectedly missing.

Code
D6_0_Disasters <- D6_0_Disasters %>% filter(code %in% list_country)
length(unique(D6_0_Disasters$code))
#> [1] 163

# Here we see which countries are missing
list_country_disasters <- c(unique(D6_0_Disasters$code))
(missing <- c(missing,setdiff(list_country, list_country_disasters)))
#>  [1] "AFG" "CUB" "MDV" "STP" "SSD" "TKM" "UZB" "BHR" "BRN" "MLT"

Here are the first few lines of the cleaned dataset on Disasters:

2.3.7 Dataset on COVID

This dataset contains information on the COVID19 pandemic between 2020 and 2022. The observation are by year, month, day. After importing the database, we transform the date in format YYYY-MM-DD in order to only keep the year.

Code
COVID <- read.csv(here("scripts", "data", "COVID.csv")) %>%
  select(iso_code, location, date, new_cases_per_million, new_deaths_per_million, stringency_index) %>%
  mutate(date = as.integer(year(date)))

We perform a first round of investigation of the missing values before aggregating the values by year. We begin with the variables “cases per million” and “deaths per million”: seeing that for each country, we have either only missing values, either a very low percentage of missing values (~1%), we can compute the sum over each year and ignore the missing values without altering the data. Indeed, where all the values are missing, the computation will return a NA. We then look at the “stringency” variable and we have 3 scenarios:

  1. ~20% missings: we ignore missing values when computing the mean to have an idea of stringency each year (because we compute the mean stringency over the year, if some days are missing, it is not a problem, it can not evoluate that fast).

  2. all are missing: we can ignore the missing values when computing the mean, because it will still return a missing value

  3. almost all are missing: here the mean doesn’t make sense -> we will replace the values by NAs to be coherent. The countries with this issues are: ERI, GUM, PRI and VIR. We verify if they are in our main dataset and since none of these countries are, we can ignore the issue, the lines will be remove later anyway.

We aggregate the observations of all days of a year in one observation per country using the mean.

Code
COVID1 <- COVID %>%
  group_by(iso_code) %>%
  summarize(NaDeaths = round(mean(is.na(new_deaths_per_million)),3),
            NaCases = round(mean(is.na(new_cases_per_million)), 3),
            NaStringency = round(mean(is.na(stringency_index)), 3)) %>%
  pivot_longer(cols = starts_with("Na"), names_to = "Variable", values_to = "NaValue")%>%
  filter(NaValue!=0)

issue_list <- c("ERI", "GUM", "PRI", "VIR")
is.element(issue_list, list_country)
#> [1] FALSE FALSE FALSE FALSE

COVID <- COVID %>%
  group_by(location, date) %>%
  mutate(
    cases_per_million = sum(new_cases_per_million, na.rm = TRUE),
    deaths_per_million = sum(new_deaths_per_million, na.rm = TRUE),
    stringency = mean(stringency_index, na.rm = TRUE)
  )%>%
  ungroup()

###

# Create a bubble plot
plot_ly(COVID1, x=~Variable, y=~NaValue,
        type = "scatter",
        marker = list(color="blue", opacity=0.1, size = 10))

Now that all the variables of interest are aggregated by year, we remove all the variables that we don’t need and rename all the remaining variables to match the main dataset.

Code
COVID <- COVID %>%
  group_by(location, date) %>%
  distinct(date, .keep_all = TRUE) %>%
  ungroup()

COVID <- COVID %>% select(-c(new_cases_per_million, new_deaths_per_million, stringency_index))

colnames(COVID) <- c("code", "country", "year", "cases_per_million", "deaths_per_million", "stringency")

We remove the years that exceed 2022, we make sure that the country codes are all iso codes with 3 letters (we observe that sometimes they are preceded by “OWID_”) and we standardize the country codes.

Code
COVID <- COVID[COVID$year <= 2022, ]

COVID$code <- gsub("OWID_", "", COVID$code)

COVID$code <- countrycode(
  sourcevar = COVID$code,
  origin = "iso3c",
  destination = "iso3c"
)

We remove the observations of countries that aren’t in our main dataset on SDGs and find that all the 166 countries that we have in the main SDG dataset are also in this one.

Code
COVID <- COVID %>% filter(code %in% list_country)
length(unique(COVID$code))
#> [1] 166

We perform a second round of missing values investigation and find out that there are no missing values except for the stringency, where there are 4.19%. Either all values are missing for one country, or 50% are missing, so these 7 countries won’t be included when analyzing the effect of stringency on the SDG scores.

Code
mean(is.na(COVID$cases_per_million))
#> [1] 0
mean(is.na(COVID$deaths_per_million))
#> [1] 0
mean(is.na(COVID$stringency))
#> [1] 0.0419

COVID4 <- COVID %>%
  group_by(code) %>%
  summarize(NaCOVID = mean(is.na(stringency))) %>%
  filter(NaCOVID != 0)

ggplot(COVID4, aes(x = reorder(code, NaCOVID), y = NaCOVID)) +
  geom_bar(stat = "identity", fill = "lightgreen", color = "black") +
  labs(title = "Proportion of Missing Values in 'stringency' by 'code'",
       x = "Code",
       y = "Proportion of Missing Values") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

D7_0_COVID <- COVID

Here are the first few lines of the cleaned dataset on COVID19:

2.3.8 Dataset on Conflicts

For our conflicts dataset, we imported the data from “The World Banck” data catalog. Once we made sure that our file called “Disasters” was convert into a data frame, we selected some specific columns that we where interested in.

Code
Conflicts <- read.csv(here("scripts", "data", "Conflicts.csv")) %>%
  as.data.frame() %>%
  select(year, country, ongoing, gwsum_bestdeaths, pop_affected, 
         peaceyearshigh, area_affected, maxintensity, maxcumulativeintensity)

Our file showed all the Conflicts and consequences per country over the years (between 2000-2016). We couldn’t find a better and more complete dataset, As we consider conflicts as events, we will only take into account results between 2000 and 2016. Then we rearranged our data, changing the data types of all the columns and their names in order to match our other datasets. We grouped the data by ” year”, “country”, re-selected some variables and summarize the data.

Code
Rearanged_Conflicts <- Conflicts %>%
  filter(year >= 2000 & year <= 2022)%>%
  mutate(
    ongoing = as.integer(ongoing),
    country = as.character(country),
    year = as.integer(year),
    gwsum_bestdeaths = as.numeric(gwsum_bestdeaths),
    pop_affected = as.numeric(pop_affected),
    area_affected = as.numeric(area_affected),
    maxintensity = as.numeric(maxintensity),
    )

# Group the data by "year", "country" and summarize the data
Conflicts <- Rearanged_Conflicts %>%
  group_by(year, country) %>%
  summarize(
    ongoing = sum (ongoing, na.rm = TRUE),
    sum_deaths = sum(gwsum_bestdeaths, na.rm = TRUE),
    pop_affected = sum(pop_affected, na.rm = TRUE),
    area_affected = sum(area_affected, na.rm = TRUE),
    maxintensity = sum(maxintensity, na.rm = TRUE),
  )

After we Selected specific columns from the summarized data and arrange the data by our specified columns. To make our dataset compatible with the main one and let the merging face succeed, we dd some adjustment concerning the country names’ to ensure the compatibility. Then we standardize and merge by country names to finally rearrange the data to retain only the countries present in our main dataset. Note that in the end we can see that only one country is missing that wasn’t in the initial conflicts database: BLR

Code
conflicts <- Conflicts %>%
  select(country, year, ongoing, sum_deaths, pop_affected, area_affected, maxintensity) %>%
  arrange(country, year)

conflicts$country <- iconv(conflicts$country, to = "UTF-8", sub = "byte")

conflicts <- conflicts %>%
  mutate(country = countrycode(country, "country.name", "country.name"))

conflicts <- conflicts %>%
  left_join(D1_0_SDG_country_list, by = "country")

conflicts <- conflicts %>%
  select(code, country, year, ongoing, sum_deaths, pop_affected, area_affected, maxintensity) %>%
  arrange(code, country, year)


D8_0_Conflicts <- conflicts %>% filter(code %in% list_country)
(length(unique(conflicts$code)))
#> [1] 166

# See which countries are missing
list_country_conflicts <- c(unique(conflicts$code))
(missing <- c(missing, setdiff(list_country, list_country_conflicts)))
#>  [1] "AFG" "CUB" "MDV" "STP" "SSD" "TKM" "UZB" "BHR" "BRN" "MLT"
#> [11] "BLR"

Here are the first few lines of the cleaned dataset on Conflicts:

2.3.9 Merge data

By merging our eight pre-cleaned datasets, we create a final database.

Code
D2_1_Unemployment_rate$country <- NULL
merge_1_2 <- D1_0_SDG |> left_join(D2_1_Unemployment_rate, join_by(code, year))

D3_1_GDP_per_capita$country <- NULL
merge_12_3 <- merge_1_2 |> left_join(D3_1_GDP_per_capita, join_by(code, year))

D3_2_Military_Expenditure_Percent_GDP$country <- NULL
merge_12_3 <- merge_12_3 |> left_join(D3_2_Military_Expenditure_Percent_GDP, join_by(code, year)) 

D3_3_Miliraty_Expenditure_Percent_Gov_Exp$country <- NULL
merge_12_3 <- merge_12_3 |> left_join(D3_3_Miliraty_Expenditure_Percent_Gov_Exp, join_by(code, year)) 

D4_0_Internet_usage$country <- NULL
merge_123_4 <- merge_12_3 |> left_join(D4_0_Internet_usage, join_by(code, year)) 

D5_0_Human_freedom_index$country <- NULL
merge_1234_5 <- merge_123_4 |> left_join(D5_0_Human_freedom_index, join_by(code, year)) 

D6_0_Disasters$country <- NULL
merge_12345_6 <- merge_1234_5 |> left_join(D6_0_Disasters, join_by(code, year)) 

D7_0_COVID$country <- NULL
D7_0_COVID <- D7_0_COVID |> distinct(code, year, .keep_all = TRUE)
merge_123456_7 <- merge_12345_6 |> left_join(D7_0_COVID, join_by(code, year)) 

D8_0_Conflicts$country <- NULL
all_Merge <- merge_123456_7 |> left_join(D8_0_Conflicts, join_by(code, year)) 

all_Merge <- all_Merge %>% filter(!code %in% missing)

2.3.10 Cleaning of the final database

We replace the NAs of the COVID columns by 0 (because we don’t have real missing, only introduced by merging for the years before COVID).

Code
all_Merge <- all_Merge %>%
  mutate(
    cases_per_million = ifelse(is.na(cases_per_million), 0, cases_per_million),
    deaths_per_million = ifelse(is.na(deaths_per_million), 0, deaths_per_million),
    stringency = ifelse(is.na(stringency), 0, stringency)
  )

Since we took the information on the continent and region from databases that are not the main one, we complete these inforamtion for the whole final dataset.

Code
all_Merge <- all_Merge %>%
  group_by(country) %>%
  mutate(continent = ifelse(is.na(continent), first(na.omit(continent)), continent)) %>%
  ungroup()

all_Merge <- all_Merge %>%
  group_by(country) %>%
  mutate(region = ifelse(is.na(region), first(na.omit(region)), region)) %>%
  ungroup()

We order the database, beginning by the information on the country, the year, the continent and the region.

Code
all_Merge <- all_Merge %>%
  select(code, year, country, continent, region, everything())

write.csv(all_Merge, file = here("scripts","data","all_Merge.csv"))

Here are the first few lines of the final dataset:

Final structure of our merged database: each country of the 166 countries from D1_1_SDG are observed each year from 2000 to 2022, thus each row has a key composed of (code, year) that uniquely identifies an observation. The other columns are the variables listed above. Due to some countries having a lot of missing information we will have to eliminate some of them, but we will still have more than 2000 rows in our database.

2.3.11 Treatment of missing values

We load our final database and we vizualize the missing values.

Code
all_Merge <- read.csv(here("scripts","data","all_Merge.csv"))

all_Merge <- all_Merge %>% select(-c(X))

# Create a dataframe with the goals without NAs summarize in one column to simplify the visualization
goal_vars <- all_Merge %>%
  select(starts_with("goal")) %>%
  filter_all(all_vars(!is.na(.))) %>%
  colnames()
to_plot_missing <- all_Merge %>%
  mutate(Goals_without_NAs = rowSums(!is.na(select(., all_of(goal_vars))))) %>%
  select(-c(goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal11, goal12, goal13, goal15, goal16, goal17))

vis_dat(to_plot_missing, warn_large_data = FALSE) + scale_fill_brewer(palette = "Paired") +
  theme(
    axis.text.x = element_text(angle = 90, size = 6),
    legend.text = element_text(size = 8),  # Adjust the size of legend text
    legend.title = element_text(size = 10) 
  )

We subset our database according to the data that we will need in order to answer the different questions. This will help us dealing with the missing values.

For question 1, we only keep the years until 2020, because most of the explanatory variables that we want to use (those coming from the human freedom index) only have values until 2020.

Code
data_question1 <- all_Merge %>%
  filter(year<=2020) %>%
  select(-c(total_deaths, no_injured, no_affected, no_homeless, total_affected, total_damages, cases_per_million, deaths_per_million, stringency, ongoing, sum_deaths, pop_affected, area_affected, maxintensity))

For question 2 and 4, we use the main data from the SDG database.

Code
data_question24 <- all_Merge %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17))

For question 3, we create 3 distinct databases according to the different type of event that we wwill analyse: disasters, COVID19 and conflicts. For the disasters, we only keep the years until 2021, because after this date, we don’t have data. For the conflicts, we only keep the years until 2016, because after this date, we don’t have data.

Code
# Disasters
data_question3_1 <- all_Merge %>%
  filter(year<=2021) %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal7, total_deaths, no_injured, no_affected, no_homeless, total_affected, total_damages))

# COVID
data_question3_2 <- all_Merge %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal7, cases_per_million, deaths_per_million, stringency))

# Conflicts 
data_question3_3 <- all_Merge %>%
  filter(year<=2016) %>%
  select(c(code, year, country, continent, region, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal7, ongoing, sum_deaths, pop_affected, area_affected, maxintensity))

2.3.11.1 Data for question 1

We begin by visualizing the missing values. To have a less messy graph we group all the goals wihtout NAs into one single variable.

Code
# Create a dataframe with the goals without NAs summarize in one column to simplify the visualization
variable_names <- names(data_question1)
missing_percentages <- sapply(data_question1, function(col) mean(is.na(col)) * 100)

missing_data_summary <- data.frame(
  Variable = variable_names,
  Missing_Percentage = missing_percentages
)

missing_data_summary <- missing_data_summary %>%
  mutate(VariableGroup = ifelse(startsWith(Variable, "goal") & Missing_Percentage == 0, "Goals without NAs", as.character(Variable)))

ggplot(data = missing_data_summary, aes(x = reorder(VariableGroup, Missing_Percentage), y = Missing_Percentage, fill = Missing_Percentage)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = ifelse(Missing_Percentage > 1, sprintf("%.1f%%", Missing_Percentage), ""),
                y = Missing_Percentage),
            position = position_stack(vjust = 1),  # Adjust vertical position
            color = "white",  # Text color
            size = 2,          # Text size
            hjust = 1.05) +
  labs(title = "Percentage of Missing Values by Variable",
       x = "Variable",
       y = "Missing Percentage") +
  theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1, size=6 ),
        legend.text = element_text(size = 8),
        legend.title = element_text(size = 10)) +
  labs(fill = "% NAs") +
  coord_flip()

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. We decide to remove the countries that have more than 50 missing values.

Code
see_missing1_1 <- data_question1 %>%
  group_by(code) %>%
  summarise(across(-c(year, country, continent, region, population, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17), 
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 50))

data_question1 <- data_question1 %>% filter(!code %in% see_missing1_1$code)

list_country_deleted <- c(unique(see_missing1_1$code))

Here is the graph that allows us to visualize the countries that have missing values, how many and for which variables, when there are more than 50 NAs in total.

Code
ggplot(see_missing1_1, aes(x = num_missing , y = reorder(code, num_missing), fill = num_missing)) +
    geom_bar(stat = "identity") + 
    scale_fill_gradient(low = "lightgreen", high = "darkgreen") +
    theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1, size=8 ),
        legend.text = element_text(size = 8),
        legend.title = element_text(size = 10)) +
    labs(title = "Number of missing values per country containing at least 50 NAs", x = "Number of Missing Values", y = "Countries")

Now, looking at the remaining countries that have missing values and there number accross all variables, we decide to remove MilitaryExpenditurePercentGovExp, because it has too many missing values and it contains similar information to MilitaryExpenditurePercentGDP.

Code
see_missing1_2 <- data_question1 %>%
  group_by(code) %>%
  summarise(across(-c(year, country, continent, region, population, overallscore, goal1, goal2, goal3, goal4, goal5, goal6, goal7, goal8, goal9, goal10, goal11, goal12, goal13, goal15, goal16, goal17),
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

data_question1 <- data_question1 %>% select(-MiliratyExpenditurePercentGovExp)

Here is the ggplot that helps us to visualize the countries that have missing values after removing the countries with more than 50 NAs.

Code
ggplot(see_missing1_2, aes(x = num_missing , y = reorder(code, num_missing), fill = num_missing)) +
    geom_bar(stat = "identity", width = 0.5) + 
    scale_fill_gradient(low = "lightgreen", high = "darkgreen") +
    theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1, size= 6 ),
        legend.text = element_text(size = 8),
        legend.title = element_text(size = 10)) +
        labs(title = "Number of missing values per country", x = "Number of Missing Values", y = "Countries")

2.3.11.1.1 GDP per capita

Only Venezuela has missing values that we can not fill, so we delete the country.

Code
question1_missing_GDP <- data_question1 %>%
  group_by(code) %>%
  summarize(NaGDPpercapita = mean(is.na(GDPpercapita)))%>%
  filter(NaGDPpercapita != 0)

data_question1 <- data_question1 %>% filter(code!="VEN")

list_country_deleted <- c(list_country_deleted, "VEN")
2.3.11.1.2 Military expenditure in % of GDP

To begin with, we delete the countries with more than 30% missing values.

Code
question1_missing_Military <- data_question1 %>%
  group_by(code) %>%
  summarize(NaMilitary = mean(is.na(MilitaryExpenditurePercentGDP)))%>%
  filter(NaMilitary != 0)

data_question1 <- data_question1 %>% filter(code!="BRB" & code!="CRI" & code!="HTI" & code!="ISL" & code!="PAN" & code!="SYR") 

list_country_deleted <- c(list_country_deleted, "BRB", "CRI", "HTI", "ISL", "PAN", "SYR") 

Then, we look at the distribution of the variable per region. Seeing that all are skewed distributions, we decide to replace the missing values, where there are less than 30% missing using the median by region.

Code
question1_missing_Military <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(MilitaryExpenditurePercentGDP))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_Military <- ggplot(data = question1_missing_Military) +
  geom_histogram(aes(x = MilitaryExpenditurePercentGDP, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of Military expenditures in % of GDP", x = "Military expenditures in % of GDP", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 3)

print(Freq_Missing_Military)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(MilitaryExpenditurePercentGDP))
  ) %>%
  ungroup() %>%  
  group_by(region) %>%
  mutate(
    MedianByRegion = median(MilitaryExpenditurePercentGDP, na.rm = TRUE),
    MilitaryExpenditurePercentGDP = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(MilitaryExpenditurePercentGDP),
      MilitaryExpenditurePercentGDP,
      ifelse(PercentageMissingByCode < 0.3, MedianByRegion, MilitaryExpenditurePercentGDP)
    )
  ) %>%
  select(-PercentageMissingByCode, -MedianByRegion)

2.3.11.1.3 Internet usage

There are only low percentage of missing values.

Code
question1_missing_Internet <- data_question1 %>%
  group_by(code) %>%
  summarize(NaInternet = mean(is.na(internet_usage)))%>%
  filter(NaInternet != 0)

We look at the evolution of the variable over time. We fill the missing values with linear interpolation, because all evolutions are in an increasing way and are almost straight lines, except for CIV that we delete.

Code
question1_missing_Internet <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(internet_usage))) %>% # Column % NAs
  filter(code %in% question1_missing_Internet$code)

Evol_Missing_Internet <- ggplot(data = question1_missing_Internet) +
  geom_line(aes(x = year, y = internet_usage, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of internet usage over time", x = "Years from 2000 to 2022", y = "Internet usage") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  scale_x_continuous(breaks=NULL)+
  facet_wrap(~ code, nrow = 4)

print(Evol_Missing_Internet)

list_code <- setdiff(unique(question1_missing_Internet$code), "CIV")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$internet_usage)
  data_question1[data_question1$code == i, "internet_usage"] <- interpolated_data
}

data_question1 <- data_question1 %>% filter(code!="CIV")

list_country_deleted <- c(list_country_deleted, "CIV") 

2.3.11.1.4 Human freedom index

First, we remove hf_score, pf_score and ef_score, because there are many missing values and since these variables summarize the other ones, deleting the will not make us loose information.

Code
data_question1 <- data_question1 %>% 
  select(-c(hf_score, pf_score, ef_score))
2.3.11.1.4.1 Personal freedom: law

The variable pf_law has (many) NAs, but only for one country: BLZ, so we decide to remove it.

Code
data_question1 <- data_question1 %>%
  filter(code!="BLZ")

list_country_deleted <- c(list_country_deleted, "BLZ") 
2.3.11.1.4.2 Economic freedom: government

Only KGZ and SRB have missing values, we plot the values over time and fill in the missing values by the year before, since there are only one and two missing values.

Code
data_question1 %>%
  filter(code %in% c("KGZ", "SRB")) %>%
  ggplot(aes(x = year, y = ef_government)) +
  geom_point(color = "green") +
  facet_wrap(~ code, nrow = 1) +
  labs(title = "Evolution of economic freedom: government over time", x = "Years", y = "ef_gov")

data_question1 <- data_question1 %>%
  mutate(ef_government = ifelse(code == "KGZ" & year == 2000 & is.na(ef_government), ef_government[which(code == "KGZ" & year == 2001)], ef_government))
data_question1 <- data_question1 %>%
  mutate(ef_government = ifelse(code == "SRB" & year == 2000 & is.na(ef_government), ef_government[which(code == "SRB" & year == 2002)], ef_government))
data_question1 <- data_question1 %>%
  mutate(ef_government = ifelse(code == "SRB" & year == 2001 & is.na(ef_government), ef_government[which(code == "SRB" & year == 2002)], ef_government))

2.3.11.1.4.3 Economic freedom: money

18 countries have missing values, but the percentage of missing values is always below 25%.

Code
question1_missing_ef_money <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_ef_money = mean(is.na(ef_money)))%>%
  filter(Na_ef_money != 0)

We look at the evolution of the variable over time. For the countries where this evolution is linear, we fill in the missing values using linear interpolation.

Code
question1_missing_ef_money <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_money))) %>% # Column % NAs
  filter(code %in% question1_missing_ef_money$code)

Evol_Missing_ef_money <- ggplot(data = question1_missing_ef_money) +
  geom_line(aes(x = year, y = ef_money, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of economic freedom: money over time", x = "Years from 2000 to 2022", y = "ef_money") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 4) +
  scale_x_continuous(breaks = NULL)

print(Evol_Missing_ef_money)

list_code <- c("ARM", "BFA", "BIH", "GEO", "KAZ", "LSO", "MDA", "MKD")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$ef_money)
  data_question1[data_question1$code == i, "ef_money"] <- interpolated_data
}

Then, we look at the distribution of the variable per region. Seeing that all are skewed distributions, we decide to replace the missing values using the median by region.

Code
question1_missing_ef_money <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_money))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_ef_money <- ggplot(data = question1_missing_ef_money) +
  geom_histogram(aes(x = ef_money, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of economic freedom: money", x = "ef_money", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 2)

print(Freq_Missing_ef_money)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(ef_money))
  ) %>%
  ungroup() %>% 
  group_by(region) %>%
  mutate(
    MedianByRegion = median(ef_money, na.rm = TRUE),
    ef_money = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(ef_money),
      ef_money,
      ifelse(PercentageMissingByCode < 0.3, MedianByRegion, ef_money)
    )
  ) %>%
  select(-PercentageMissingByCode, -MedianByRegion)

2.3.11.1.4.4 Economic freedom: trade

19 countries have missing values, but the percentage of missing values is always below 25%.

Code
question1_missing_ef_trade <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_ef_trade = mean(is.na(ef_trade)))%>% # Column % NAs
  filter(Na_ef_trade != 0)

question1_missing_ef_trade <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_trade))) %>%
  filter(code %in% question1_missing_ef_trade$code)

We look at the evolution of the variable over time. For the countries where this evolution is linear, we fill in the missing values using linear interpolation.

Code
Evol_Missing_ef_trade <- ggplot(data = question1_missing_ef_trade) +
  geom_line(aes(x = year, y = ef_trade, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of economic freedom: trade over time", x = "Years from 2000 to 2022", y = "ef_trade") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  facet_wrap(~ code, nrow = 4) +
  scale_x_continuous(breaks = NULL)

print(Evol_Missing_ef_trade)

# Linear interpolation for "AZE", "BFA", "ETH", "GEO", "VNH"
list_code <- c("AZE", "BFA", "ETH", "GEO", "VNH")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$ef_trade)
  data_question1[data_question1$code == i, "ef_trade"] <- interpolated_data
}

Then, we look at the distribution of the variable per region. Seeing that all are skewed distributions, we decide to replace the missing values using the median by region.

Code
question1_missing_ef_trade <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_trade))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_ef_trade <- ggplot(data = question1_missing_ef_trade) +
  geom_histogram(aes(x = ef_trade, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of economic freedom: trade", x = "ef_trade", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 2)

print(Freq_Missing_ef_trade)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(ef_trade))
  ) %>%
  ungroup() %>% 
  group_by(region) %>%
  mutate(
    MedianByRegion = median(ef_trade, na.rm = TRUE),
    ef_trade = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(ef_trade),
      ef_trade,
      ifelse(PercentageMissingByCode < 0.3, MedianByRegion, ef_trade)
    )
  ) %>%
  select(-PercentageMissingByCode, -MedianByRegion)

2.3.11.1.4.5 Economic freedom: regulation

12 countries have missing values, but the percentage of missing values is always below 25%.

Code
question1_missing_ef_regulation <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_ef_regulation = mean(is.na(ef_regulation)))%>% # Column % NAs
  filter(Na_ef_regulation != 0)

We look at the evolution of the variable over time. For the countries where this evolution is linear, we fill in the missing values using linear interpolation.

Code
question1_missing_ef_regulation <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_regulation))) %>%
  filter(code %in% question1_missing_ef_regulation$code)

Evol_Missing_ef_regulation <- ggplot(data = question1_missing_ef_regulation) +
  geom_line(aes(x = year, y = ef_regulation, 
                 color = cut(PercentageMissing,
                             breaks = c(0, 0.1, 0.2, 0.3, 1),
                             labels = c("0-10%", "10-20%", "20-30%", "30-100%")))) +
  labs(title = "Evolution of economic freedom: regulation over time", x = "Years from 2000 to 2022", y = "ef_regulation") +
  scale_color_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%" = "red", "30-100%" = "black"),
                     labels = c("0-10%", "10-20%", "20-30%", "50-100%")) +
  guides(color = guide_legend(title = "% missings")) +
  scale_x_continuous(breaks = NULL)+
  facet_wrap(~ code, nrow = 2)

print(Evol_Missing_ef_regulation)

list_code <- c("ETH", "KAZ", "MDA", "SRB")
for (i in list_code) {
  country_data <- data_question1 %>% filter(code == i)
  interpolated_data <- na.interp(country_data$ef_regulation)
  data_question1[data_question1$code == i, "ef_regulation"] <- interpolated_data
}

Then, we look at the distribution of the variable per region. Seeing that all are skewed distributions, we decide to replace the missing values using the median by region.

Code
question1_missing_ef_regulation <- data_question1 %>%
  group_by(code) %>%
  mutate(PercentageMissing = mean(is.na(ef_regulation))) %>% # Column % NAs
  ungroup() %>%
  group_by(region) %>%
  filter(sum(PercentageMissing, na.rm = TRUE) > 0)

Freq_Missing_ef_regulation <- ggplot(data = question1_missing_ef_regulation) +
  geom_histogram(aes(x = ef_regulation, 
                     fill = cut(PercentageMissing,
                                breaks = c(0, 0.1, 0.2, 0.3, 1),
                                labels = c("0-10%", "10-20%", "20-30%", "30-100%"))),
                 bins = 30) +
  labs(title = "Distribution of economic freedom: regulation", x = "ef_regulation", y = "Frequency") +
  scale_fill_manual(values = c("0-10%" = "blue", "10-20%" = "green", "20-30%"="red","30-100%" = "black"), labels = c("0-10%", "10-20%", "20-30%","30-100%")) +
  guides(fill = guide_legend(title = "% missings")) +
  facet_wrap(~ region, nrow = 1)

print(Freq_Missing_ef_regulation)

data_question1 <- data_question1 %>%
  group_by(code) %>%
  mutate(
    PercentageMissingByCode = mean(is.na(ef_regulation))
  ) %>%
  ungroup() %>% 
  group_by(region) %>%
  mutate(
    MedianByRegion = median(ef_regulation, na.rm = TRUE),
    ef_regulation = ifelse(
      PercentageMissingByCode < 0.3 & !is.na(ef_regulation),
      ef_regulation,
      ifelse(PercentageMissingByCode < 0.3, MedianByRegion, ef_regulation)
    )
  ) %>%
  select(-PercentageMissingByCode, -MedianByRegion) %>%
  ungroup()

Now, we notice that there were only missing values for goals 1 and 10. As we did before, we have started to investigate where are located the NAs in our dataset for first goal1, then goal 10.

Code
na_count <- sapply(data_question1, function(x) sum(is.na(x)))
na_count_df <- data.frame(variable = names(na_count), num_missing = na_count)
na_count_df_filtered <- subset(na_count_df, num_missing > 0)
ggplot(na_count_df_filtered, aes(x= num_missing,y=variable, fill = num_missing)) +
    geom_bar(aes(fill = num_missing), stat = "identity", width = 0.8, fill = 'lightblue') +
    geom_text(aes(label = num_missing), vjust = 0.5,hjust = 1.1, position = position_dodge(width = 0.9)) +
    theme_minimal() +
    theme(axis.text.y = element_text(hjust = 1, size=10 ), 
          legend.text = element_text(size = 8),
          legend.title = element_text(size = 10)) +
    labs(title = "Number of remaining missing values per variable ",
         x = "Number of NAs",
         y = "Variables")

# goal1
question1_missing_goal1 <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_goal1 = mean(is.na(goal1)))%>%
  filter(Na_goal1 != 0)

data_question1 <- data_question1 %>% filter(!code %in% question1_missing_goal1$code)

# Update List of countries deleted
list_country_deleted <- c(list_country_deleted, "KWT","NZL","OMN","SGP","UKR")

# still 42 NA values goal10

We had found that the missing values were located in only 5 countries. So we have decided to get rid of them. At this stage, there were only 42 remaining missing values. Then, same step for goal 10.

Code
#goal10
question1_missing_goal10 <- data_question1 %>%
  group_by(code) %>%
  summarize(Na_goal10 = mean(is.na(goal10)))%>%
  filter(Na_goal10 != 0)

data_question1 <- data_question1 %>% filter(!code %in% question1_missing_goal10$code)

# Update List of countries deleted
list_country_deleted <- c(list_country_deleted, "GUY","TTO")

We have found the 2 lasts contries containing missing values. Now, our dataset is completely clean and ready to be used for our question 1.

2.3.11.2 Data for question 2 and 4

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. Since there are no other missing values, we stop here.

Code
see_missing24 <- data_question24 %>%
  group_by(code) %>%
  summarise(across(everything(), ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

2.3.11.3 Data for question 3

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. Since there are no other missing values, we stop here.

Disasters

We begin by visualizing the missing values.

Code
variable_names <- names(data_question3_1)
missing_percentages <- sapply(data_question3_1, function(col) mean(is.na(col)) * 100)

missing_data_summary <- data.frame(
  Variable = variable_names,
  Missing_Percentage = missing_percentages
)

missing_data_summary <- missing_data_summary %>%
  mutate(VariableGroup = ifelse(startsWith(Variable, "goal") & Missing_Percentage == 0, "Goals without NAs", as.character(Variable)))

ggplot(data = missing_data_summary, aes(x = reorder(VariableGroup, Missing_Percentage), y = Missing_Percentage, fill = Missing_Percentage)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = ifelse(Missing_Percentage > 1, sprintf("%.1f%%", Missing_Percentage), ""),
                y = Missing_Percentage),
            position = position_stack(vjust = 1),  # Adjust vertical position
            color = "white",  # Text color
            size = 3,          # Text size
            hjust = 1.05) +
  labs(title = "Percentage of Missing Values by Variable",
       x = "Variable",
       y = "Missing Percentage") +
  theme_minimal() +
  theme(axis.text.y = element_text(hjust = 1)) +
  coord_flip()

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. We find out that there are many missing values and here are the first few lines identifying them by country.

Code
see_missing3_1 <- data_question3_1 %>%
  group_by(code) %>%
  summarise(across(-c(goal1, goal10),  # Exclude columns "goal1" and "goal10"
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))
for_kable <- head(see_missing3_1, 10)
kable(for_kable)
code year country continent region overallscore goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal11 goal12 goal13 goal15 goal16 total_deaths no_injured no_affected no_homeless total_affected total_damages num_missing
AGO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 6
ALB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 9 9 9 9 9 54
ARE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 21 21 21 21 21 126
ARM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 15 15 15 15 15 90
AUT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 8 8 8 8 48
AZE 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 17 17 17 17 17 102
BDI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 3 3 3 18
BEL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 30
BEN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 7 7 7 7 42
BFA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5 30

In this particular case, even if there are many missing values in our disaster dataset, we made the hypothesis that disaster events can not happen every year for every country given that these are uncontrollable and non-recurring events. Therefore the NAs that we encounter will become zeroes, implying that there have been no climatic disasters.

Code
data_question3_1[is.na(data_question3_1)] <- 0

COVID19

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed. Since there are no other missing values, we stop here.

Code
see_missing3_2 <- data_question3_2 %>%
  group_by(code) %>%
  summarise(across(-c(goal1, goal10),  # Exclude columns "goal1" and "goal10"
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

Conflicts

We create a column with the number of missing values by country over all the variables, except goal 1 and goal 10 that we already discussed.Two countries have missing values, we remove them (MNE and SRB).

Code
see_missing3_3 <- data_question3_3 %>%
  group_by(code) %>%
  summarise(across(-c(goal1, goal10),  # Exclude columns "goal1" and "goal10"
                   ~ sum(is.na(.))) %>%
              mutate(num_missing = rowSums(across(everything()))) %>%
              filter(num_missing > 0))

data_question3_3 <- data_question3_3 %>% filter(!code %in% c("MNE","SRB"))

##### EXPORT as CSV #####
write.csv(data_question1, file = here("scripts","data","data_question1.csv"))
write.csv(data_question24, file = here("scripts","data","data_question24.csv"))
write.csv(data_question3_1, file = here("scripts","data","data_question3_1.csv"))
write.csv(data_question3_2, file = here("scripts","data","data_question3_2.csv"))
write.csv(data_question3_3, file = here("scripts","data","data_question3_3.csv"))

3 Exploratory data analysis

3.1 General exploration

We display the distribution of the different SDG achievement scores, using boxplots to have an overview of the median, the range with most of the observations and the outliers.

Code
data_question1 <- read.csv(here("scripts","data","data_question1.csv"))
data_question24 <- read.csv(here("scripts", "data", "data_question24.csv"))
data_question2 <- read.csv(here("scripts", "data", "data_question24.csv"))
data_question3_1 <- read.csv(here("scripts", "data", "data_question3_1.csv"))
data_question3_2 <- read.csv(here("scripts", "data", "data_question3_2.csv"))
data_question3_3 <- read.csv(here("scripts", "data", "data_question3_3.csv"))
Q3.1 <- read.csv(here("scripts", "data", "data_question3_1.csv"))
Q3.2 <- read.csv(here("scripts", "data", "data_question3_2.csv"))
Q3.3 <- read.csv(here("scripts", "data", "data_question3_3.csv"))
data <- read.csv(here("scripts", "data", "all_Merge.csv"))

Correlation_overall <- data_question1 %>% 
      select(population:ef_regulation)

#### boxplots ####

#for goals
#dev.off()
# boxplot(Correlation_overall[2:18], 
#         las = 2,            # Makes the axis labels perpendicular to the axis
#         par(mar = c(5, 4, 4, 2) + 0.1),  # Adjusts the margins to fit all labels
#         cex.axis = 0.7,      # Reduces the size of the axis labels
#         cex.lab = 1,       # Reduces the size of the x and y labels
#         notch = TRUE,       # Specifies whether to add notches or not
#         main = "Merged goals boxplot", # Title of the boxplot
#         xlab = "Goals",  # X-axis label
#         ylab = "Score")     # Y-axis label

#boxplot per continent

data_Q1_Africa <- data_question1 %>%
  filter(data_question1$continent == 'Africa')
data_Q1_Europe <- data_question1 %>%
  filter(data_question1$continent == 'Europe')
data_Q1_Asia <- data_question1 %>%
  filter(data_question1$continent == 'Asia')
data_Q1_Americas <- data_question1 %>%
  filter(data_question1$continent == 'Americas')
data_Q1_Oceania <- data_question1 %>%
  filter(data_question1$continent == 'Oceania')

#Africa
data_Q1_Africa_long <- melt(data_Q1_Africa[,8:24])
medians_AF <- data_Q1_Africa_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))
medians_AF$color <- ifelse(medians_AF$median_value > 75, "lightblue", 
                        ifelse(medians_AF$median_value < 25, "red", 'orange'))
data_Q1_Africa_long <- data_Q1_Africa_long %>%
  left_join(medians_AF, by = "variable")

bandwidth_nrd_AF <- bw.nrd(data_Q1_Africa_long$value)
ggplot(data_Q1_Africa_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_AF) +  
  scale_fill_identity() +
  labs(title = "Africa SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Europe
data_Q1_Europe_long <- melt(data_Q1_Europe[,8:24])

medians_EU <- data_Q1_Europe_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_EU$color <- ifelse(medians_EU$median_value > 75, "lightblue", 
                        ifelse(medians_EU$median_value < 25, "red", 'orange'))

data_Q1_Europe_long <- data_Q1_Europe_long %>%
  left_join(medians_EU, by = "variable")

bandwidth_nrd_EU <- bw.nrd(data_Q1_Europe_long$value)
ggplot(data_Q1_Europe_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_EU) +
  scale_fill_identity() +
  labs(title = "European SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Asia
data_Q1_Asia_long <- melt(data_Q1_Asia[,8:24])

medians_AS <- data_Q1_Asia_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_AS$color <- ifelse(medians_AS$median_value > 75, "lightblue", 
                        ifelse(medians_AS$median_value < 25, "red", 'orange'))

data_Q1_Asia_long <- data_Q1_Asia_long %>%
  left_join(medians_AS, by = "variable")

bandwidth_nrd_AS <- bw.nrd(data_Q1_Asia_long$value)
ggplot(data_Q1_Asia_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_AS) +
  scale_fill_identity() +
  labs(title = "Asian SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Americas
data_Q1_Americas_long <- melt(data_Q1_Americas[,8:24])

medians_AM <- data_Q1_Americas_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_AM$color <- ifelse(medians_AM$median_value > 75, "lightblue", 
                        ifelse(medians_AM$median_value < 25, "red", 'orange'))

data_Q1_Americas_long <- data_Q1_Americas_long %>%
  left_join(medians_AM, by = "variable")

bandwidth_nrd_AM <- bw.nrd(data_Q1_Americas_long$value)
ggplot(data_Q1_Americas_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_AM) +
  scale_fill_identity() +
  labs(title = "American SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

#Oceania
data_Q1_Oceania_long <- melt(data_Q1_Oceania[,8:24])

medians_OC <- data_Q1_Oceania_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_OC$color <- ifelse(medians_OC$median_value > 75, "lightblue", 
                        ifelse(medians_OC$median_value < 25, "red", 'orange'))

data_Q1_Oceania_long <- data_Q1_Oceania_long %>%
  left_join(medians_OC, by = "variable")

bandwidth_nrd_OC <- bw.nrd(data_Q1_Oceania_long$value)
ggplot(data_Q1_Oceania_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_OC) +
  scale_fill_identity() +
  labs(title = "Oceanian SDG goals boxplot", x = "Goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

# Correlation_goals <- melt(Correlation_overall[,2:18])
# ggplot(Correlation_goals, aes(x= variable, y= value)) + 
#   geom_violin(trim=FALSE, fill="orange") +
#   labs(title="Merged goals violin boxplot",x="Goals", y = "Distribution") +
#   geom_boxplot(width=0.1, outlier.size = 1) +
#   scale_y_continuous(labels = scales::label_number()) + #limits = c(0, 100)
#   theme_classic() +
#   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

#### WHY GOING BELOW 0 TO > 100 ?? SCORES ONLY FROM 0 TO 100

We see different schemes among the different goals. Indeed some are quite homogeneous with a small spread of values (e.g. overall score, goals 2 and 8) while others have a large spread of values (e.g. goals 1 and 10). Goals 1, 3, 4, 7, 9, 10 and 13 have values across all possible percentages. Goals 2, 5, 8, 13 and 17 have extreme values situated outside the 95% confidence interval. It is interesting to see that goal 8 (decent work and economic growth) is the one with smaller spread of values, whereas goal 1 (no poverty) have the higher distance between the first and the third quartile. Goal 2 (no hunger) has a tight spread of values, but with the greater amount of outliers in the smaller values, meaning hunger is similar across most countries, but when it differs it is in very bad manner.

We now display boxplpots for the different variables of the human freedom index, and then also for our other independent variables.

Code
#for Human Freedom Index scores 


#Oceania

data_Q1_Oceania_HFI_long <- melt(data_Q1_Oceania[,29:40])

medians_HFI_OC <- data_Q1_Oceania_HFI_long %>%
  group_by(variable) %>%
  summarize(median_value = median(value))

medians_HFI_OC$color <- ifelse(medians_HFI_OC$median_value > 7.5, "lightblue", 
                        ifelse(medians_HFI_OC$median_value < 2.5, "red", 'orange'))

data_Q1_Oceania_HFI_long <- data_Q1_Oceania_HFI_long %>%
  left_join(medians_HFI_OC, by = "variable")

bandwidth_nrd_HFI_OC <- bw.nrd(data_Q1_Oceania_HFI_long$value)
ggplot(data_Q1_Oceania_HFI_long, aes(x = variable, y = value, fill = color)) + 
  geom_violin(trim = FALSE, bw = bandwidth_nrd_HFI_OC) +
  scale_fill_identity() +
  labs(title = "Oceanian HFI Scores boxplot", x = "Human Freedom Index goals", y = "Score") +
  geom_boxplot(width = 0.1, outlier.size = 1, fill = 'white') +
  scale_y_continuous(labels = scales::label_number()) +
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

boxplot(Correlation_overall[23:34], 
        las = 2,            # Makes the axis labels perpendicular to the axis
        par(mar = c(7, 5, 2, 1)),  # Adjusts the margins to fit all labels
        cex.axis = 0.7,      # Reduces the size of the axis labels
        cex.lab = 1,       # Reduces the size of the x and y labels
        notch = TRUE,       # Specifies whether to add notches or not
        main = "Merged Human Freedom Index scores boxplot", 
        ylab = "Score")     # Y-axis label


Correlation_HFI <- melt(Correlation_overall[,23:34])
ggplot(Correlation_HFI, aes(x= variable, y= value)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Merged Human Freedom Index scores violin boxplot",x="Variables", y = "Score")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) + #limits = c(0, 100)
  theme_classic() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

v1 <- ggplot(Correlation_overall, aes(x= factor(1), y= GDPpercapita)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of GDP per capita",x="GDP per capita", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
v2 <- ggplot(Correlation_overall, aes(x= factor(1), y= unemployment.rate)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of unemployment rate",x="Unemployment rate", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
v3 <- ggplot(Correlation_overall, aes(x= factor(1), y= MilitaryExpenditurePercentGDP)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of military expenditure by percentage of GDP",x="Military Expenditure", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
v4 <- ggplot(Correlation_overall, aes(x= factor(1), y= internet_usage)) + 
  geom_violin(trim=FALSE, fill="orange")+
  labs(title="Violin plot of internet_usage",x="internet_usage", y = "Distribution")+
  geom_boxplot(width=0.1, outlier.size = 1)+
  scale_y_continuous(labels = scales::label_number()) +  # Format y-axis labels
  theme_classic()
grid.arrange(v1,v2,v3,v4, ncol = 2, nrow = 2)

We now look at the variables in a summary table to have a more precise view of the numbers.

X code year country continent region overallscore goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 goal17
Min. : 1 Length:3565 Min. :2000 Length:3565 Length:3565 Length:3565 Min. :37.4 Min. : 0.0 Min. :16.5 Min. : 5.9 Min. : 0.0 Min. : 3.5 Min. :23.3 Min. : 0.1 Min. :40.0 Min. : 0.3 Min. : 0.0 Min. :20.3 Min. :32.9 Min. : 0.0 Min. :26.0 Min. :27.9 Min. :15.1
1st Qu.: 892 Class :character 1st Qu.:2005 Class :character Class :character Class :character 1st Qu.:55.0 1st Qu.: 44.5 1st Qu.:52.6 1st Qu.:44.3 1st Qu.: 55.6 1st Qu.:43.2 1st Qu.:53.0 1st Qu.:41.5 1st Qu.:64.0 1st Qu.:15.5 1st Qu.: 35.2 1st Qu.:55.8 1st Qu.:67.9 1st Qu.:72.9 1st Qu.:55.0 1st Qu.:51.5 1st Qu.:46.1
Median :1783 Mode :character Median :2011 Mode :character Mode :character Mode :character Median :65.5 Median : 87.4 Median :58.9 Median :70.9 Median : 80.6 Median :58.0 Median :65.3 Median :65.5 Median :70.2 Median :29.4 Median : 62.2 Median :75.3 Median :84.6 Median :90.8 Median :65.1 Median :61.4 Median :55.4
Mean :1783 NA Mean :2011 NA NA NA Mean :64.0 Mean : 71.7 Mean :58.0 Mean :64.1 Mean : 72.0 Mean :56.0 Mean :65.0 Mean :57.9 Mean :70.0 Mean :37.5 Mean : 58.3 Mean :70.3 Mean :79.3 Mean :82.1 Mean :65.0 Mean :62.6 Mean :55.7
3rd Qu.:2674 NA 3rd Qu.:2017 NA NA NA 3rd Qu.:72.4 3rd Qu.: 98.8 3rd Qu.:65.3 3rd Qu.:81.4 3rd Qu.: 94.5 3rd Qu.:68.9 3rd Qu.:75.2 3rd Qu.:72.6 3rd Qu.:76.6 3rd Qu.:53.9 3rd Qu.: 81.6 3rd Qu.:85.1 3rd Qu.:94.1 3rd Qu.:97.2 3rd Qu.:74.3 3rd Qu.:74.6 3rd Qu.:65.1
Max. :3565 NA Max. :2022 NA NA NA Max. :86.8 Max. :100.0 Max. :83.4 Max. :97.3 Max. :100.0 Max. :94.0 Max. :95.1 Max. :99.6 Max. :88.7 Max. :99.2 Max. :100.0 Max. :99.1 Max. :99.0 Max. :99.9 Max. :97.9 Max. :96.0 Max. :96.8
NA NA NA NA NA NA NA NA's :276 NA NA NA NA NA NA NA NA NA's :276 NA NA NA NA NA NA

3.2 Focus on the influence of the factors over the SDG scores

After importing our our cleaned data, we looked first at the correlations between our numerical variables.

Code
#### Correlations between variables ####

sdg_scores2 <- data_question1[, c('goal1', 'goal2', 'goal3', 'goal4', 'goal5', 'goal6',
                      'goal7', 'goal8', 'goal9', 'goal10', 'goal11', 'goal12',
                      'goal13', 'goal15', 'goal16', 'goal17')]

Correlation_overall <- data_question1 %>% 
      select(population:ef_regulation)

#before computing Pearson -> logarithmic transformation may be required for some variables with high skewness

# Calculating skewness for each variable

Correlation_overall_skew <- Correlation_overall
Correlation_overall_sqrt <- Correlation_overall

skewness_values <- sapply(Correlation_overall_skew, e1071::skewness)

# Identifying highly skewed variables
highly_skewed_vars <- names(skewness_values[abs(skewness_values) > 1])
highly_skewed_vars_sqrt <- names(skewness_values[abs(skewness_values) > 1])
# Applying logarithmic transformation
Correlation_overall_skew[highly_skewed_vars] <- lapply(Correlation_overall_skew[highly_skewed_vars], function(x) log1p(x))
#applying squart root transformation
Correlation_overall_sqrt[highly_skewed_vars_sqrt] <- lapply(Correlation_overall_sqrt[highly_skewed_vars_sqrt], function(x) sqrt(x))

new_skewness_values <- sapply(Correlation_overall_skew[highly_skewed_vars], e1071::skewness)
new_skewness_values_sqrt <- sapply(Correlation_overall_sqrt[highly_skewed_vars_sqrt], e1071::skewness)

#après transformation, il reste toujours beaucoup de variables avec un high skewness values print(new_skewness_values), print(new_skewness_values_sqrt)

cor_matrix_log <- cor(Correlation_overall_skew, use = "everything")
cor_matrix_sqrt <- cor(Correlation_overall_sqrt, use = "everything")
cor_matrix_sper <- cor(Correlation_overall, method = "spearman", use = "everything")

datatable(cor_matrix_log, 
          options = list(
            pageLength = 10, 
            class = "hover", 
            searchHighlight = TRUE,
            columnDefs = list(
              list(targets = "_all",
                   render = JS(
                     "function(data, type, row, meta){",
                     "  if(type === 'display'){",
                     "    return parseFloat(data).toFixed(2)",
                     "  }",
                     "  return data;",
                     "}")))),
          rownames = FALSE)

By doing so, we obtain a lot of positive and negative correlations. To help us to better understand and having a overall vision of the situation, we used the following heatmap.

Code
#### Heatmap ####

#with log transformation

cor_melted_log <- melt(cor_matrix_log)

ggplot(data = cor_melted_log, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '', title = 'Correlation Matrix Heatmap')

#with square root transformation

cor_melted_sqrt <- melt(cor_matrix_sqrt)

ggplot(data = cor_melted_sqrt, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '', title = 'Correlation Matrix Heatmap')

#with Spearman

cor_melted <- melt(cor_matrix_sper)

ggplot(data = cor_melted, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name="Spearman\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '', title = 'Correlation Matrix Heatmap')



#do 3 different heatmaps : goals on goals, goals on other variables except goals, variables on variables (except goals)

In the correlation matrix heatmap, we can notice that many goals from 1 to 11 are actually positively correlated together. On another hand, the goals 12 and 13 are have negative relationships with the majority of our variables, except between themself, whereas they are strongely correlated. In addition, we can notice another strongly correlation between personal freedom variables (pf) related to the scores given by the Human Freedom Index on movement, religion, assembly and expression.

In order to have an overview of the relationship between our independent variables and the SDG overall score, we make several graphs containing the Pearson correlation coefficient between the variable, the scatter plots describing the relationship between the variables, as well as the distribution of each variable.

Code
#### Pearson's correlation coeff ####

panel.hist <- function(x, ...){ 
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(usr[1:2], 0, 1.5) ) 
  h <- hist(x, plot = FALSE) 
  breaks <- h$breaks; nB <- length(breaks) 
  y <- h$counts; y <- y/max(y) 
  rect(breaks[-nB], 0, breaks[-1], y, col = "lightgreen", ...)
}
panel.cor <- function(x, y, digits = 2, prefix = "", cex.cor, ...){ 
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- (cor(x, y)) 
  txt <- format(c(r, 0.123456789), digits = digits)[1] 
  txt <- paste0(prefix, txt) 
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt) 
  text(0.5, 0.5, txt, cex = cex.cor * r)
}

# Independent variables 
#with log transformation
pairs(Correlation_overall_skew[,c("overallscore", "unemployment.rate", "GDPpercapita", "MilitaryExpenditurePercentGDP", "internet_usage")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of various variables")
#with square root transformation
pairs(Correlation_overall_sqrt[,c("overallscore", "unemployment.rate", "GDPpercapita", "MilitaryExpenditurePercentGDP", "internet_usage")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of various variables")

The overall SDG achievement score is highly correlated with the percentage of people using the internet (r=.79) and GDP per capita (r=.60). The unemployement rate as well as the military expenditures in percentage of GDP per capita do not seem to play a role. However, this is only for the overall score.

Code
#with log transformation
pairs(Correlation_overall_skew[,c("overallscore", "pf_law", "pf_security", "pf_movement", "pf_religion", "pf_assembly", "pf_expression", "pf_identity")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of personal freedom variables")

#with square root transformation
pairs(Correlation_overall_sqrt[,c("overallscore", "pf_law", "pf_security", "pf_movement", "pf_religion", "pf_assembly", "pf_expression", "pf_identity")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of personal freedom variables")

The overall SDG achievement score is highly correlated with “personal freedom: law” (p=.69) and “personal freedom: identity” (p=.62). The other dimensions of personal freedom do not seem to have important influence. Regarding the distribution of the personal freedom variables, we notice that except for law, all have right-skewed distributions meaning that most of the countries have high scores.

Code
#with log transformation
pairs(Correlation_overall_skew[,c("overallscore", "ef_government", "ef_legal", "ef_money", "ef_trade", "ef_regulation")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of economic freedom variables")
#with square root transformation
pairs(Correlation_overall_sqrt[,c("overallscore", "ef_government", "ef_legal", "ef_money", "ef_trade", "ef_regulation")], upper.panel=panel.cor, diag.panel=panel.hist, main="Correlation table and distribution of economic freedom variables")

The overall SDG achievement score is highly correlated with “economical freedom: legal” (p=.77), “economical trade: legal” (p=.67) and “economical freedom: money” (p=.6), while the other dimensions of economic freedom do not seem to have important influence. Regarding the distribution of the economic freedom variables, we notice more heterogeneous distributions and scores across the various countries than for personal freedom.

Code
#### PCA ####

# for goals
myPCA_g <- PCA(data_question1[,9:24], graph = FALSE)
plot(myPCA_g$ind$coord[, 1], myPCA_g$ind$coord[, 2], xlab = "PC1", ylab = "PC2", main = "PCA Plot SDG Goals", pch = 19, col = "blue", cex = 0.5) + 
  abline(h = 0, col = "red", lty = 2) + 
  abline(v = 0, col = "red", lty = 2)
#> integer(0)
plot.PCA(myPCA_g, choix = "var", pch = 10, cex = 0.6)

summary(myPCA_g)
#> 
#> Call:
#> PCA(X = data_question1[, 9:24], graph = FALSE) 
#> 
#> 
#> Eigenvalues
#>                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
#> Variance               9.837   1.406   0.963   0.820   0.718   0.513
#> % of var.             61.479   8.786   6.020   5.125   4.490   3.205
#> Cumulative % of var.  61.479  70.265  76.285  81.410  85.899  89.104
#>                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11  Dim.12
#> Variance               0.352   0.328   0.264   0.195   0.147   0.134
#> % of var.              2.203   2.050   1.651   1.220   0.921   0.838
#> Cumulative % of var.  91.307  93.357  95.009  96.228  97.149  97.987
#>                       Dim.13  Dim.14  Dim.15  Dim.16
#> Variance               0.111   0.090   0.063   0.059
#> % of var.              0.692   0.559   0.395   0.367
#> Cumulative % of var.  98.678  99.238  99.633 100.000
#> 
#> Individuals (the 10 first)
#>            Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
#> 1      |  5.028 | -4.565  0.085  0.824 | -0.040  0.000  0.000 |
#> 2      |  5.013 | -4.514  0.083  0.811 | -0.016  0.000  0.000 |
#> 3      |  4.909 | -4.618  0.087  0.885 | -0.173  0.001  0.001 |
#> 4      |  4.838 | -4.531  0.084  0.877 | -0.098  0.000  0.000 |
#> 5      |  4.793 | -4.497  0.082  0.880 | -0.123  0.000  0.001 |
#> 6      |  4.733 | -4.472  0.081  0.893 | -0.137  0.001  0.001 |
#> 7      |  4.673 | -4.252  0.074  0.828 |  0.076  0.000  0.000 |
#> 8      |  4.617 | -4.162  0.070  0.813 |  0.118  0.000  0.001 |
#> 9      |  4.219 | -3.781  0.058  0.803 | -0.168  0.001  0.002 |
#> 10     |  4.103 | -3.857  0.061  0.884 | -0.329  0.003  0.006 |
#>         Dim.3    ctr   cos2  
#> 1       0.894  0.033  0.032 |
#> 2       0.970  0.039  0.037 |
#> 3       0.412  0.007  0.007 |
#> 4       0.545  0.012  0.013 |
#> 5       0.469  0.009  0.010 |
#> 6       0.296  0.004  0.004 |
#> 7       0.908  0.034  0.038 |
#> 8       0.981  0.040  0.045 |
#> 9       0.967  0.039  0.053 |
#> 10      0.363  0.005  0.008 |
#> 
#> Variables (the 10 first)
#>           Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr
#> goal1  |  0.860  7.522  0.740 |  0.210  3.127  0.044 | -0.121  1.522
#> goal2  |  0.668  4.531  0.446 |  0.141  1.416  0.020 | -0.097  0.983
#> goal3  |  0.946  9.093  0.894 |  0.155  1.708  0.024 | -0.103  1.107
#> goal4  |  0.859  7.505  0.738 |  0.302  6.478  0.091 |  0.005  0.003
#> goal5  |  0.721  5.278  0.519 |  0.064  0.288  0.004 |  0.378 14.807
#> goal6  |  0.918  8.569  0.843 |  0.066  0.310  0.004 |  0.046  0.224
#> goal7  |  0.867  7.644  0.752 |  0.285  5.787  0.081 |  0.005  0.003
#> goal8  |  0.782  6.224  0.612 | -0.170  2.065  0.029 | -0.129  1.723
#> goal9  |  0.892  8.092  0.796 | -0.150  1.610  0.023 | -0.047  0.228
#> goal10 |  0.542  2.981  0.293 | -0.521 19.303  0.271 | -0.345 12.366
#>          cos2  
#> goal1   0.015 |
#> goal2   0.009 |
#> goal3   0.011 |
#> goal4   0.000 |
#> goal5   0.143 |
#> goal6   0.002 |
#> goal7   0.000 |
#> goal8   0.017 |
#> goal9   0.002 |
#> goal10  0.119 |
myPCA_g$eig
#>         eigenvalue percentage of variance
#> comp 1      9.8366                 61.479
#> comp 2      1.4058                  8.786
#> comp 3      0.9632                  6.020
#> comp 4      0.8199                  5.125
#> comp 5      0.7183                  4.490
#> comp 6      0.5128                  3.205
#> comp 7      0.3525                  2.203
#> comp 8      0.3280                  2.050
#> comp 9      0.2642                  1.651
#> comp 10     0.1951                  1.220
#> comp 11     0.1473                  0.921
#> comp 12     0.1341                  0.838
#> comp 13     0.1106                  0.692
#> comp 14     0.0895                  0.559
#> comp 15     0.0632                  0.395
#> comp 16     0.0587                  0.367
#>         cumulative percentage of variance
#> comp 1                               61.5
#> comp 2                               70.3
#> comp 3                               76.3
#> comp 4                               81.4
#> comp 5                               85.9
#> comp 6                               89.1
#> comp 7                               91.3
#> comp 8                               93.4
#> comp 9                               95.0
#> comp 10                              96.2
#> comp 11                              97.1
#> comp 12                              98.0
#> comp 13                              98.7
#> comp 14                              99.2
#> comp 15                              99.6
#> comp 16                             100.0

Concerning the SDG goals, we conclude that most of our variables are going along the 1st component, except the goals 10 and 15 that are rather uncorrelated with the dimension 1. In addition, as seen before, the goals 12 and 13 are negatively correlated to the other goals. With a eigenvalue bigger than 1 for the first two components, we conclude that there are only 2 dimensions to take into account, according to the Kaiser-Guttman’s rule. Nevertheless, they are explaining less than 80% of cumulated variance.

Code
#for HFI scores
myPCA_s <- PCA(data_question1[,29:40], graph = FALSE)
plot(myPCA_s$ind$coord[, 1], myPCA_s$ind$coord[, 2], xlab = "PC1", ylab = "PC2", main = "PCA Plot HFI Scores", pch = 19, col = "blue", cex = 0.5) + 
  abline(h = 0, col = "red", lty = 2) + 
  abline(v = 0, col = "red", lty = 2)
#> integer(0)
plot.PCA(myPCA_s, choix = "var",cex = 0.5)
summary(myPCA_s)
#> 
#> Call:
#> PCA(X = data_question1[, 29:40], graph = FALSE) 
#> 
#> 
#> Eigenvalues
#>                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
#> Variance               6.477   1.581   1.081   0.710   0.560   0.476
#> % of var.             53.979  13.171   9.010   5.920   4.666   3.966
#> Cumulative % of var.  53.979  67.150  76.160  82.080  86.746  90.713
#>                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11  Dim.12
#> Variance               0.305   0.230   0.207   0.181   0.116   0.075
#> % of var.              2.545   1.914   1.724   1.512   0.970   0.623
#> Cumulative % of var.  93.257  95.171  96.895  98.407  99.377 100.000
#> 
#> Individuals (the 10 first)
#>                   Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2
#> 1             |  5.714 | -4.642  0.133  0.660 |  0.939  0.022  0.027
#> 2             |  4.928 | -4.333  0.116  0.773 |  0.560  0.008  0.013
#> 3             |  4.964 | -4.238  0.111  0.729 |  1.063  0.029  0.046
#> 4             |  3.666 | -3.523  0.077  0.924 |  0.149  0.001  0.002
#> 5             |  3.589 | -3.355  0.070  0.874 |  0.102  0.000  0.001
#> 6             |  5.952 | -4.479  0.124  0.566 |  0.505  0.006  0.007
#> 7             |  4.722 | -3.585  0.079  0.576 | -0.592  0.009  0.016
#> 8             |  4.660 | -3.610  0.081  0.600 | -0.655  0.011  0.020
#> 9             |  4.717 | -3.680  0.084  0.608 | -0.736  0.014  0.024
#> 10            |  4.105 | -3.623  0.081  0.779 |  0.677  0.012  0.027
#>                  Dim.3    ctr   cos2  
#> 1             |  0.942  0.033  0.027 |
#> 2             |  0.749  0.021  0.023 |
#> 3             |  1.081  0.043  0.047 |
#> 4             |  0.471  0.008  0.017 |
#> 5             |  0.518  0.010  0.021 |
#> 6             | -2.619  0.254  0.194 |
#> 7             | -2.518  0.235  0.284 |
#> 8             | -2.430  0.218  0.272 |
#> 9             | -2.639  0.258  0.313 |
#> 10            | -1.023  0.039  0.062 |
#> 
#> Variables (the 10 first)
#>                  Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
#> pf_law        |  0.856 11.301  0.732 | -0.311  6.109  0.097 | -0.118
#> pf_security   |  0.547  4.627  0.300 | -0.465 13.663  0.216 | -0.203
#> pf_movement   |  0.828 10.572  0.685 |  0.291  5.367  0.085 | -0.140
#> pf_religion   |  0.705  7.668  0.497 |  0.554 19.445  0.307 | -0.275
#> pf_assembly   |  0.820 10.391  0.673 |  0.444 12.492  0.197 | -0.196
#> pf_expression |  0.879 11.918  0.772 |  0.214  2.907  0.046 | -0.248
#> pf_identity   |  0.645  6.423  0.416 |  0.007  0.003  0.000 |  0.087
#> ef_government | -0.122  0.228  0.015 |  0.684 29.606  0.468 |  0.573
#> ef_legal      |  0.862 11.459  0.742 | -0.309  6.060  0.096 |  0.037
#> ef_money      |  0.692  7.401  0.479 | -0.157  1.559  0.025 |  0.487
#>                  ctr   cos2  
#> pf_law         1.277  0.014 |
#> pf_security    3.823  0.041 |
#> pf_movement    1.804  0.020 |
#> pf_religion    6.988  0.076 |
#> pf_assembly    3.561  0.038 |
#> pf_expression  5.707  0.062 |
#> pf_identity    0.692  0.007 |
#> ef_government 30.323  0.328 |
#> ef_legal       0.126  0.001 |
#> ef_money      21.925  0.237 |
myPCA_s$eig
#>         eigenvalue percentage of variance
#> comp 1      6.4775                 53.979
#> comp 2      1.5806                 13.171
#> comp 3      1.0812                  9.010
#> comp 4      0.7104                  5.920
#> comp 5      0.5599                  4.666
#> comp 6      0.4760                  3.966
#> comp 7      0.3054                  2.545
#> comp 8      0.2297                  1.914
#> comp 9      0.2068                  1.724
#> comp 10     0.1814                  1.512
#> comp 11     0.1164                  0.970
#> comp 12     0.0748                  0.623
#>         cumulative percentage of variance
#> comp 1                               54.0
#> comp 2                               67.2
#> comp 3                               76.2
#> comp 4                               82.1
#> comp 5                               86.7
#> comp 6                               90.7
#> comp 7                               93.3
#> comp 8                               95.2
#> comp 9                               96.9
#> comp 10                              98.4
#> comp 11                              99.4
#> comp 12                             100.0

Now concerning the Human Freedom Index scores, most of the variables are positively correlated to the dimension 1, slightly less for the PF religion and security, and finaly the EF government variable is uncorrelated to the dimension 1. With a eigenvalue bigger than 1 for the three first components, we conclude that there are 3 dimensions to take into account. Nevertheless, again, they are explaining less than 80% of cumulated variance.

Code
#### Kmean clustering ####

data1_scaled <- scale(Correlation_overall)
rownames(data1_scaled) <- seq_along(row.names(data1_scaled))
fviz_nbclust(data1_scaled, kmeans, method="wss")
kmean <- kmeans(data1_scaled, 7, nstart = 25)
print(kmean)
#> K-means clustering with 7 clusters of sizes 649, 328, 415, 417, 362, 286, 42
#> 
#> Cluster means:
#>   population overallscore  goal1    goal2   goal3  goal4   goal5
#> 1    -0.1175     -1.36318 -1.449 -0.81472 -1.3936 -1.385 -0.8789
#> 2    -0.0541      0.17641  0.544  0.00762  0.2521  0.149 -0.3879
#> 3    -0.2228      0.90857  0.782  0.66143  0.8405  0.795  0.4841
#> 4    -0.0441     -0.00632  0.188  0.17283  0.1959  0.371  0.1977
#> 5    -0.0600      1.23573  0.801  0.88135  1.1750  0.849  1.1963
#> 6    -0.2757      0.07437  0.277 -0.56992 -0.1056  0.133 -0.0352
#> 7     7.2721     -0.38531 -0.247  0.56278 -0.0921  0.469 -0.2078
#>     goal6  goal7   goal8  goal9 goal10 goal11 goal12  goal13  goal15
#> 1 -1.2223 -1.384 -0.8171 -0.981 -0.415 -1.345  0.957  0.7707  0.0747
#> 2 -0.1950  0.295 -0.3245 -0.115  0.383  0.195  0.279 -0.0591 -0.5520
#> 3  0.8673  0.800  0.7735  0.695  0.636  0.771 -0.716 -0.4075  0.6050
#> 4  0.1533  0.275 -0.0246 -0.394 -0.955  0.237  0.380  0.5083 -0.6663
#> 5  1.3303  0.855  1.4579  1.716  1.020  1.061 -1.788 -1.6815  0.4901
#> 6 -0.0731  0.183 -0.7134 -0.268 -0.202  0.131  0.150  0.2447  0.1412
#> 7 -0.6498 -0.172  0.0534  0.128 -0.800 -0.754  0.725  0.3586 -1.3909
#>   goal16   goal17 unemployment.rate GDPpercapita
#> 1 -1.013 -0.88702           -0.4506       -0.614
#> 2 -0.101  0.25031           -0.0371       -0.316
#> 3  0.795 -0.00349            0.2777        0.252
#> 4 -0.517 -0.07079           -0.3422       -0.435
#> 5  1.524  0.85550           -0.3224        2.003
#> 6  0.187  0.91913            1.6081       -0.438
#> 7 -0.684 -1.14336           -0.2663       -0.497
#>   MilitaryExpenditurePercentGDP internet_usage pf_law pf_security
#> 1                        -0.131        -0.9411 -0.832     -0.4806
#> 2                         0.879        -0.0108 -0.490     -0.0541
#> 3                         0.108         0.6929  0.802      0.7241
#> 4                        -0.517        -0.2850 -0.543     -0.7438
#> 5                        -0.314         1.3701  1.602      0.9262
#> 6                         0.226        -0.1110  0.141      0.0120
#> 7                         0.403        -0.4433 -0.626      0.0151
#>   pf_movement pf_religion pf_assembly pf_expression pf_identity
#> 1      -0.605      -0.165      -0.490       -0.5657      -0.932
#> 2      -1.148      -1.736      -1.515       -1.2681      -0.653
#> 3       0.658       0.546       0.703        0.7639       0.750
#> 4       0.316       0.427       0.325        0.0394       0.304
#> 5       0.908       0.748       0.918        1.3118       0.879
#> 6       0.326       0.309       0.397        0.0165       0.197
#> 7      -1.378      -2.071      -1.379       -0.7129       0.156
#>   ef_government ef_legal ef_money ef_trade ef_regulation
#> 1        0.0485  -0.9672  -0.9082  -1.0115        -0.727
#> 2       -0.1659  -0.4117  -0.2798  -0.4079        -0.274
#> 3       -0.2337   0.6501   0.7269   0.8652         0.362
#> 4        0.9587  -0.3821   0.1523   0.1605        -0.238
#> 5       -0.7281   1.7424   0.9840   1.0330         1.105
#> 6        0.0255   0.0860  -0.0965   0.0753         0.495
#> 7       -0.5607  -0.0729  -0.2994  -0.7427        -0.731
#> 
#> Clustering vector:
#>    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 6 6 6 6 6 6 6 6 6
#>   [32] 6 6 6 6 6 6 6 6 3 3 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#>   [63] 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 4 4 6 6 6 6 6 6 6 6 6
#>   [94] 2 6 6 6 6 6 6 6 6 6 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#>  [125] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 2 2 2
#>  [156] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [187] 1 1 1 5 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 1 1
#>  [218] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [249] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 6 6 6 6 6 6
#>  [280] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#>  [311] 6 6 6 6 6 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>  [342] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#>  [373] 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5
#>  [404] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#>  [435] 5 5 5 5 5 5 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 7 7 7
#>  [466] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [497] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#>  [528] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4
#>  [559] 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [590] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 3 5 5 5 5 5 5 5 5 5 5 5
#>  [621] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#>  [652] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2
#>  [683] 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>  [714] 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3
#>  [745] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5
#>  [776] 5 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 5 5 5 5 5 5 5
#>  [807] 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 2 2 2 2 2 2 2 6 6 6 6
#>  [838] 6 6 6 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 1 1
#>  [869] 1 6 6 6 6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#>  [900] 5 5 5 5 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 1 1 1 1 1 1
#>  [931] 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#>  [962] 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#>  [993] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [1024] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4
#> [1055] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7
#> [1086] 7 7 7 7 7 7 7 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2
#> [1117] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [1148] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4
#> [1179] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2
#> [1210] 2 2 2 2 2 2 2 2 2 5 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4
#> [1241] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1
#> [1272] 1 1 1 4 4 1 4 4 4 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2
#> [1303] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 4 2 2 2 2 2 2
#> [1334] 2 2 2 2 2 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 6 6 6 6 6 6 6
#> [1365] 6 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5
#> [1396] 5 5 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [1427] 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 4 4 6 6 6 6
#> [1458] 6 6 6 6 6 6 6 6 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [1489] 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6
#> [1520] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [1551] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 6 6 6 6 6 6
#> [1582] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#> [1613] 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [1644] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#> [1675] 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
#> [1706] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 4 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6
#> [1737] 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [1768] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4
#> [1799] 4 4 4 4 4 2 2 2 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
#> [1830] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 1 1 1 1 1 1 1
#> [1861] 1 1 4 4 4 6 6 4 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4
#> [1892] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#> [1923] 4 4 4 4 4 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [1954] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [1985] 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#> [2016] 4 4 4 4 4 4 4 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2
#> [2047] 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2
#> [2078] 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1
#> [2109] 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#> [2140] 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 3 3 3 3 3 3 3 3 3 3 3
#> [2171] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [2202] 3 5 5 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1 1 1 1 1 1
#> [2233] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [2264] 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4 4 4 4 4 2 2 2 2 2 4 2 2 2 2 2 2
#> [2295] 2 2 2 2 2 2 6 6 6 6 6 6 6 6 6 2 2 2 2 2 2 2 2 2 2 6 2 2 2 2 2
#> [2326] 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [2357] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 6 6 6 6 3 3 3 3 3 3 3 3 3
#> [2388] 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2
#> [2419] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 6 6 6 6 6 6 6 6 6 6 6 6 6
#> [2450] 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [2481] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 
#> Within cluster sum of squares by cluster:
#> [1] 10371  5910  4023  4976  2844  4594   750
#>  (between_SS / total_SS =  60.6 %)
#> 
#> Available components:
#> 
#> [1] "cluster"      "centers"      "totss"        "withinss"    
#> [5] "tot.withinss" "betweenss"    "size"         "iter"        
#> [9] "ifault"
fviz_cluster(kmean, data=data1_scaled, repel=FALSE, depth =NULL, ellipse.type = "norm", labelsize = 0, pointsize = 0.5)

### NOW CLUSTERING BY COUNTRY? AND TAKE MEAN OF EVERY VARIABLE ON EVERY CONCERNED YEAR?

Due to the large number of data, the visualization of the clusters using the kmean method is not really relevant. In addition, by clustering our data, we are trying to get group that differ from eachother but with little variation of the observations within the same cluster. Here, only 60.6% of the variance is explained by the variation between clusters. This is not enough.

3.3 Focus on the influence of events over the SDG scores

In order to have an overview of the relationship between the different events variables and the SDG overall score, we make several graphs containing the Pearson correlation coefficient between the variable, the scatter plots describing the relationship between the variables, as well as the distribution of each variable.

Code
lower.panel <- function(x, y, ...){
  points(x, y, pch = 20, col = "black", cex = 0.2)
}

evaluateCorrelationStars <- function(correlation) {
  if (abs(correlation) >= 0.7) {
    return("*****")  # Strong correlation: 5 stars
  } else if (abs(correlation) >= 0.5) {
    return("****")   # Moderate correlation: 4 stars
  } else if (abs(correlation) >= 0.3) {
    return("***")    # Fair correlation: 3 stars
  } else if (abs(correlation) >= 0.1) {
    return("**")     # Weak correlation: 2 stars
  } else {
    return("*")      # Very weak correlation: 1 star
  }
}

# panel.cor function with stars alongside correlation coefficients
panel.cor_stars <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y)
  stars <- evaluateCorrelationStars(r)
  txt <- paste0(format(c(r, 0.123456789), digits = digits)[1], " ", stars)
  if(missing(cex.cor)) cex.cor <- 0.5/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor)
}

pairs(data_question3_1[, c("overallscore", "total_affected", "total_deaths")], upper.panel = panel.cor_stars,diag.panel = panel.hist,lower.panel = lower.panel, main = "Correlation table and distribution of Disaster variables")

The different variables used to materialize the impact of climate disasters do not seem to have important influence on the overall score, but we will further explore for the different SDGs, since we believe that such disasters have a specific influence on some SDGs.

Code
lower.panel <- function(x, y, ...){
  points(x, y, pch = 20, col = "black", cex = 0.2)
}

evaluateCorrelationStars <- function(correlation) {
  if (abs(correlation) >= 0.7) {
    return("*****")  
  } else if (abs(correlation) >= 0.5) {
    return("****")
  } else if (abs(correlation) >= 0.3) {
    return("***") 
  } else if (abs(correlation) >= 0.1) {
    return("**")
  } else {
    return("*")
  }
}

# panel.cor function with stars alongside correlation coefficients
panel.cor_stars <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y)
  stars <- evaluateCorrelationStars(r)
  txt <- paste0(format(c(r, 0.123456789), digits = digits)[1], " ", stars)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor)
}

pairs(data_question3_2[,c("overallscore", "cases_per_million", "deaths_per_million", "stringency")], upper.panel = panel.cor_stars, diag.panel=panel.hist, lower.panel = lower.panel,main="Correlation table and distribution of COVID variables")

The different variables used to materialize the impact of COVID19 do not seem to have important influence on the overall score, but we will further explore for the different SDGs, since we believe that COVID19 had a specific influence on some SDGs, for instance “good health and well-being” or “decent work and economic growth”.

Code
lower.panel <- function(x, y, ...){
  points(x, y, pch = 20, col = "black", cex = 0.5)
}
evaluateCorrelationStars <- function(correlation) {
  if (abs(correlation) >= 0.7) {
    return("*****") 
  } else if (abs(correlation) >= 0.5) {
    return("****")
  } else if (abs(correlation) >= 0.3) {
    return("***") 
  } else if (abs(correlation) >= 0.1) {
    return("**")
  } else {
    return("*")
  }
}

# panel.cor function with stars alongside correlation coefficients
panel.cor_stars <- function(x, y, digits = 2, prefix = "", cex.cor, ...) {
  usr <- par("usr"); on.exit(par(usr)) 
  par(usr = c(0, 1, 0, 1)) 
  r <- cor(x, y)
  stars <- evaluateCorrelationStars(r)
  txt <- paste0(format(c(r, 0.123456789), digits = digits)[1], " ", stars)
  if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
  text(0.5, 0.5, txt, cex = cex.cor)
}

pairs(data_question3_3[,c("overallscore", "ongoing", "sum_deaths", "pop_affected", "area_affected", "maxintensity")], upper.panel = panel.cor_stars, diag.panel=panel.hist, lower.panel = lower.panel, main="Correlation table and distribution of conflicts variables")

The different variables used to materialize the impact of conflicts do not seem to have important influence on the overall score, but we will further explore for the different SDGs, since we believe that conflicts have a specific influence on some SDGs.

To explore our data on events such as disasters, covid-19 and conflicts we have to first see which countries are the most touched by these. To do so, we made time-series analysis on this three events each time depending on different variables.

Code
# Converted 'year' column to date format
Q3.1$year <- as.Date(as.character(Q3.1$year), format = "%Y")
Q3.2$year <- as.Date(as.character(Q3.2$year), format = "%Y")
Q3.3$year <- as.Date(as.character(Q3.3$year), format = "%Y")

These is our time-analysis concerning the COVID-19 cases per million by region between end 2018 and 2022.

Code
library(ggplot2)
covid_filtered <- Q3.2[Q3.2$year >= as.Date("2018-12-12"), ]

ggplot(data = covid_filtered, aes(x = year, y = cases_per_million, group = region, color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.8, size = 0.5) + 
  labs(title = "Trend of COVID-19 Cases per Million Over Time",
       x = "Year", y = "Cases per Million") +
  facet_wrap(~ region, ncol = 3) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(nrow = 3))

These is our time-analysis concerning the COVID-19 deaths per million per region between end 2018 and 2022

Code
ggplot(data = covid_filtered, aes(x = year, y = deaths_per_million, group = region, color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.8, size = 0.5) + 
  labs(title = "Trend of COVID-19 Deaths per Million Over Time", x = "Year", y = "Deaths per Million") +
  facet_wrap(~ region, nrow = 3) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(nlin = 3))

These is our time-analysis concerning the COVID-19 stringency per region between end 2018 and 2022

Code
ggplot(data = covid_filtered, aes(x = year, y = stringency, group = region, 
                                  color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.7, size = 0.5) + 
  labs(title = "Trend of COVID-19 Stringency Over Time", x = "Year", y = "Stringency") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

These is our time-analysis concerning climatic disasters with total affected per region

Code
Q3.1[is.na(Q3.1)] <- 0
ggplot(data = Q3.1, aes(x = year, y = total_affected, group = region, color = region)) +
  geom_smooth(method = "loess",  se = FALSE, span = 0.7, size = 0.5) + 
  labs(title = "Trend of Total Affected from Climatic Disasters Over Time", x = "Year", y = "Total Affected") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

These is our time-analysis concerning conflicts deaths per region between 2000 and 2016

Code
conflicts_filtered <- Q3.3[Q3.3$year >= as.Date("2000-01-01") & Q3.3$year <= as.Date("2016-12-31"), ]

ggplot(data = conflicts_filtered, aes(x = year, y = sum_deaths, group = region, color = region)) +
  geom_smooth(method = "loess", se = FALSE, span = 0.3, size = 0.5) +  # Using loess smoothing method
  labs(title = "Trend of Deaths by Conflicts Over Time", x = "Year", y = "Sum Deaths") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

We can see that the regions’ the most affected by the conflicts are : Middle east and north Africa, Sub-Saharan Africa, South Asia, then less America & the Caribbean and Eastern Europe

These is our time-analysis concerning conflicts affected population per region between 2000 and 2016

Code
ggplot(data = conflicts_filtered, aes(x = year, y = pop_affected, group = region, color = region)) +
  geom_smooth(method = "loess", se = FALSE, span = 0.3, size = 0.5) +  # Using loess smoothing method
  labs(title = "Trend of Population Affected by Conflicts Over Time", x = "Year", y = "pop_affected") +
  facet_wrap(~ region, nrow = 4) +
  theme( axis.text.x = element_text(angle = 45, size = 8, hjust = 1),
         axis.text.y = element_text(vjust = 1, size = 8, hjust = 1),
         plot.title = element_text(margin = margin(b = 20), hjust = 0.5, 
                                   vjust = 8, lineheight = 2),
         strip.text = element_blank(),
         panel.spacing = unit(0.5, "lines")
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(ncol = 1))

We can see that the regions’ the most affected by the conflicts are : Middle east and north Africa, Sub-Saharan Africa, South Asia, America & the Caribbean, Eastern Europe and sometimes Caucasus and Central Asia

Now that we could visualize which regions are the most impacted by these three events we can do correlations analysis per region to see if this events have indeed an impact on the evolution of SDG goals.

Here we want to analyse the correlation between the climate disasters and the SDG goals in South and East Asia.

Code
Q3.1[is.na(Q3.1)] <- 0

south_east_asia_data <- Q3.1[Q3.1$region %in% c("South Asia", "East Asia"), ]

relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "total_affected", "no_homeless")

correlation_matrix_disaster_Asia <- cor(south_east_asia_data[, relevant_columns], use = "complete.obs")

kable(correlation_matrix_disaster_Asia)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 total_affected no_homeless
goal1 1.000 -0.026 0.322 0.394 0.186 0.358 0.402 0.537 0.203 0.577 0.170 -0.035 -0.073 0.450 0.125 -0.040 -0.050
goal2 -0.026 1.000 0.647 0.505 0.573 0.547 0.512 0.548 0.679 -0.205 0.520 -0.302 -0.321 -0.280 0.474 0.099 -0.076
goal3 0.322 0.647 1.000 0.789 0.588 0.703 0.826 0.806 0.864 -0.170 0.804 -0.747 -0.725 -0.212 0.719 -0.017 -0.105
goal4 0.394 0.505 0.789 1.000 0.605 0.497 0.630 0.610 0.656 -0.080 0.455 -0.580 -0.604 -0.103 0.373 0.093 -0.014
goal5 0.186 0.573 0.588 0.605 1.000 0.563 0.451 0.453 0.427 -0.100 0.529 -0.404 -0.450 -0.205 0.347 0.055 -0.152
goal6 0.358 0.547 0.703 0.497 0.563 1.000 0.667 0.625 0.693 -0.006 0.655 -0.578 -0.542 -0.135 0.582 -0.128 -0.207
goal7 0.402 0.512 0.826 0.630 0.451 0.667 1.000 0.702 0.760 -0.084 0.809 -0.536 -0.487 -0.208 0.548 -0.024 -0.060
goal8 0.537 0.548 0.806 0.610 0.453 0.625 0.702 1.000 0.741 0.189 0.642 -0.576 -0.563 -0.033 0.639 -0.012 -0.090
goal9 0.203 0.679 0.864 0.656 0.427 0.693 0.760 0.741 1.000 -0.115 0.671 -0.733 -0.730 -0.220 0.660 0.011 -0.067
goal10 0.577 -0.205 -0.170 -0.080 -0.100 -0.006 -0.084 0.189 -0.115 1.000 -0.306 0.182 0.158 0.608 -0.033 -0.150 -0.038
goal11 0.170 0.520 0.804 0.455 0.529 0.655 0.809 0.642 0.671 -0.306 1.000 -0.631 -0.557 -0.354 0.695 -0.123 -0.154
goal12 -0.035 -0.302 -0.747 -0.580 -0.404 -0.578 -0.536 -0.576 -0.733 0.182 -0.631 1.000 0.959 0.139 -0.732 0.112 0.116
goal13 -0.073 -0.321 -0.725 -0.604 -0.450 -0.542 -0.487 -0.563 -0.730 0.158 -0.557 0.959 1.000 0.069 -0.671 0.055 0.096
goal15 0.450 -0.280 -0.212 -0.103 -0.205 -0.135 -0.208 -0.033 -0.220 0.608 -0.354 0.139 0.069 1.000 0.022 -0.071 -0.022
goal16 0.125 0.474 0.719 0.373 0.347 0.582 0.548 0.639 0.660 -0.033 0.695 -0.732 -0.671 0.022 1.000 -0.146 -0.130
total_affected -0.040 0.099 -0.017 0.093 0.055 -0.128 -0.024 -0.012 0.011 -0.150 -0.123 0.112 0.055 -0.071 -0.146 1.000 0.147
no_homeless -0.050 -0.076 -0.105 -0.014 -0.152 -0.207 -0.060 -0.090 -0.067 -0.038 -0.154 0.116 0.096 -0.022 -0.130 0.147 1.000
Code

cor_melted <- as.data.frame(as.table(correlation_matrix_disaster_Asia))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between the climate disasters and the SDG goals in South and East Asia')

We conclude that climate disasters do not really have a big impact on SDG goals.

Here we want to analyse the correlation between the Covid-19 and the SDG goals only during Covid time.

Code
covid_filtered <- Q3.2[Q3.2$year >= as.Date("2019-01-01"), ]

relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "stringency", "cases_per_million", "deaths_per_million")
# Subset data with relevant columns for correlation analysis
relevant_data <- covid_filtered[, relevant_columns]

correlation_matrix_Covid <- cor(relevant_data, use = "complete.obs")

kable(correlation_matrix_Covid)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 stringency cases_per_million deaths_per_million
goal1 1.000 0.534 0.867 0.777 0.445 0.763 0.798 0.584 0.781 0.497 0.727 -0.648 -0.553 0.099 0.714 0.056 0.341 0.361
goal2 0.534 1.000 0.560 0.541 0.469 0.605 0.469 0.636 0.569 0.240 0.463 -0.353 -0.284 0.122 0.451 0.088 0.206 0.242
goal3 0.867 0.560 1.000 0.829 0.641 0.836 0.845 0.693 0.881 0.456 0.828 -0.789 -0.669 0.152 0.825 0.040 0.412 0.373
goal4 0.777 0.541 0.829 1.000 0.656 0.764 0.803 0.596 0.773 0.309 0.758 -0.655 -0.558 0.058 0.674 0.113 0.349 0.339
goal5 0.445 0.469 0.641 0.656 1.000 0.663 0.606 0.587 0.645 0.098 0.690 -0.653 -0.564 0.203 0.628 0.060 0.330 0.261
goal6 0.763 0.605 0.836 0.764 0.663 1.000 0.765 0.711 0.811 0.366 0.766 -0.727 -0.583 0.262 0.729 0.069 0.389 0.398
goal7 0.798 0.469 0.845 0.803 0.606 0.765 1.000 0.556 0.740 0.323 0.793 -0.654 -0.494 0.123 0.697 0.055 0.340 0.374
goal8 0.584 0.636 0.693 0.596 0.587 0.711 0.556 1.000 0.695 0.387 0.587 -0.635 -0.556 0.283 0.627 0.024 0.356 0.278
goal9 0.781 0.569 0.881 0.773 0.645 0.811 0.740 0.695 1.000 0.462 0.753 -0.857 -0.760 0.189 0.819 0.074 0.460 0.353
goal10 0.497 0.240 0.456 0.309 0.098 0.366 0.323 0.387 0.462 1.000 0.281 -0.496 -0.469 0.215 0.519 -0.030 0.262 0.142
goal11 0.727 0.463 0.828 0.758 0.690 0.766 0.793 0.587 0.753 0.281 1.000 -0.696 -0.576 0.089 0.764 0.037 0.345 0.328
goal12 -0.648 -0.353 -0.789 -0.655 -0.653 -0.727 -0.654 -0.635 -0.857 -0.496 -0.696 1.000 0.876 -0.316 -0.825 0.013 -0.466 -0.292
goal13 -0.553 -0.284 -0.669 -0.558 -0.564 -0.583 -0.494 -0.556 -0.760 -0.469 -0.576 0.876 1.000 -0.205 -0.682 -0.018 -0.364 -0.166
goal15 0.099 0.122 0.152 0.058 0.203 0.262 0.123 0.283 0.189 0.215 0.089 -0.316 -0.205 1.000 0.303 -0.068 0.169 0.223
goal16 0.714 0.451 0.825 0.674 0.628 0.729 0.697 0.627 0.819 0.519 0.764 -0.825 -0.682 0.303 1.000 -0.023 0.425 0.316
stringency 0.056 0.088 0.040 0.113 0.060 0.069 0.055 0.024 0.074 -0.030 0.037 0.013 -0.018 -0.068 -0.023 1.000 0.041 0.336
cases_per_million 0.341 0.206 0.412 0.349 0.330 0.389 0.340 0.356 0.460 0.262 0.345 -0.466 -0.364 0.169 0.425 0.041 1.000 0.416
deaths_per_million 0.361 0.242 0.373 0.339 0.261 0.398 0.374 0.278 0.353 0.142 0.328 -0.292 -0.166 0.223 0.316 0.336 0.416 1.000
Code

cor_melted <- as.data.frame(as.table(correlation_matrix_Covid))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between COVID and the SDG goals')

Same conclusion, really weird.

Here we want to analyse the correlation between conflicts deaths and the SDG goals only for the Middle East & North Africa, Sub-Saharan Africa, South Asia, Latin America & the Caribbean and Eastern Europe regions.

Code

# Filter data for specific regions
selected_regions <- c("Middle East & North Africa", "Sub-Saharan Africa", "South Asia", "Latin America & the Caribbean", "Eastern Europe")
conflicts_selected <- Q3.3[Q3.3$region %in% selected_regions, ]

# Select relevant columns for the correlation analysis
relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "sum_deaths")

# Compute correlation matrix for the selected regions
correlation_matrix_Conflicts_Deaths <- cor(conflicts_selected[, relevant_columns], use = "complete.obs")

# View the correlation matrix
kable(correlation_matrix_Conflicts_Deaths)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 sum_deaths
goal1 1.000 0.476 0.910 0.791 0.406 0.799 0.865 0.546 0.723 0.272 0.783 -0.730 -0.594 0.039 0.613 -0.095
goal2 0.476 1.000 0.544 0.531 0.540 0.638 0.531 0.571 0.530 0.102 0.475 -0.376 -0.322 0.154 0.430 -0.173
goal3 0.910 0.544 1.000 0.814 0.507 0.832 0.876 0.596 0.768 0.223 0.828 -0.745 -0.587 0.014 0.666 -0.117
goal4 0.791 0.531 0.814 1.000 0.645 0.748 0.808 0.536 0.696 0.089 0.768 -0.667 -0.533 0.007 0.496 -0.101
goal5 0.406 0.540 0.507 0.645 1.000 0.587 0.539 0.454 0.516 -0.178 0.620 -0.464 -0.351 0.191 0.384 -0.162
goal6 0.799 0.638 0.832 0.748 0.587 1.000 0.812 0.670 0.734 0.137 0.788 -0.711 -0.529 0.187 0.599 -0.166
goal7 0.865 0.531 0.876 0.808 0.539 0.812 1.000 0.539 0.720 0.152 0.841 -0.704 -0.531 0.039 0.566 -0.094
goal8 0.546 0.571 0.596 0.536 0.454 0.670 0.539 1.000 0.609 0.209 0.542 -0.519 -0.389 0.181 0.462 -0.102
goal9 0.723 0.530 0.768 0.696 0.516 0.734 0.720 0.609 1.000 0.300 0.698 -0.759 -0.689 0.137 0.591 -0.077
goal10 0.272 0.102 0.223 0.089 -0.178 0.137 0.152 0.209 0.300 1.000 0.035 -0.297 -0.299 0.118 0.275 0.078
goal11 0.783 0.475 0.828 0.768 0.620 0.788 0.841 0.542 0.698 0.035 1.000 -0.729 -0.570 0.031 0.656 -0.155
goal12 -0.730 -0.376 -0.745 -0.667 -0.464 -0.711 -0.704 -0.519 -0.759 -0.297 -0.729 1.000 0.865 -0.170 -0.666 0.122
goal13 -0.594 -0.322 -0.587 -0.533 -0.351 -0.529 -0.531 -0.389 -0.689 -0.299 -0.570 0.865 1.000 -0.150 -0.493 0.079
goal15 0.039 0.154 0.014 0.007 0.191 0.187 0.039 0.181 0.137 0.118 0.031 -0.170 -0.150 1.000 0.191 -0.063
goal16 0.613 0.430 0.666 0.496 0.384 0.599 0.566 0.462 0.591 0.275 0.656 -0.666 -0.493 0.191 1.000 -0.162
sum_deaths -0.095 -0.173 -0.117 -0.101 -0.162 -0.166 -0.094 -0.102 -0.077 0.078 -0.155 0.122 0.079 -0.063 -0.162 1.000
Code

# Melt the correlation matrix for ggplot2
cor_melted <- as.data.frame(as.table(correlation_matrix_Conflicts_Deaths))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

# Create the heatmap
ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between Conflicts deaths and the SDG goals')

Finally, we want to analyse the correlation between conflicts affected population and the SDG goals only for the Middle East & North Africa, Sub-Saharan Africa, South Asia, Latin America & the Caribbean, Eastern Europe regions and Caucasus and Central Asia.

Code

# Filter data for specific regions (pop_affected)
selected_regions <- c("Middle East & North Africa", "Sub-Saharan Africa", "South Asia", "Latin America & the Caribbean", "Eastern Europe","Caucasus and Central Asia")
conflicts_selected <- Q3.3[Q3.3$region %in% selected_regions, ]

# Select relevant columns for the correlation analysis
relevant_columns <- c("goal1", "goal2", "goal3", "goal4", "goal5", "goal6", "goal7", "goal8", "goal9", "goal10", "goal11", "goal12", "goal13", "goal15", "goal16", "pop_affected")

# Compute correlation matrix for the selected regions
correlation_matrix_Conflicts_Pop_Affected <- cor(conflicts_selected[, relevant_columns], use = "complete.obs")

# View the correlation matrix
kable(correlation_matrix_Conflicts_Pop_Affected)
goal1 goal2 goal3 goal4 goal5 goal6 goal7 goal8 goal9 goal10 goal11 goal12 goal13 goal15 goal16 pop_affected
goal1 1.000 0.476 0.910 0.791 0.406 0.799 0.865 0.546 0.723 0.272 0.783 -0.730 -0.594 0.039 0.613 -0.066
goal2 0.476 1.000 0.544 0.531 0.540 0.638 0.531 0.571 0.530 0.102 0.475 -0.376 -0.322 0.154 0.430 -0.083
goal3 0.910 0.544 1.000 0.814 0.507 0.832 0.876 0.596 0.768 0.223 0.828 -0.745 -0.587 0.014 0.666 -0.058
goal4 0.791 0.531 0.814 1.000 0.645 0.748 0.808 0.536 0.696 0.089 0.768 -0.667 -0.533 0.007 0.496 -0.030
goal5 0.406 0.540 0.507 0.645 1.000 0.587 0.539 0.454 0.516 -0.178 0.620 -0.464 -0.351 0.191 0.384 -0.152
goal6 0.799 0.638 0.832 0.748 0.587 1.000 0.812 0.670 0.734 0.137 0.788 -0.711 -0.529 0.187 0.599 -0.106
goal7 0.865 0.531 0.876 0.808 0.539 0.812 1.000 0.539 0.720 0.152 0.841 -0.704 -0.531 0.039 0.566 -0.071
goal8 0.546 0.571 0.596 0.536 0.454 0.670 0.539 1.000 0.609 0.209 0.542 -0.519 -0.389 0.181 0.462 -0.099
goal9 0.723 0.530 0.768 0.696 0.516 0.734 0.720 0.609 1.000 0.300 0.698 -0.759 -0.689 0.137 0.591 0.000
goal10 0.272 0.102 0.223 0.089 -0.178 0.137 0.152 0.209 0.300 1.000 0.035 -0.297 -0.299 0.118 0.275 0.074
goal11 0.783 0.475 0.828 0.768 0.620 0.788 0.841 0.542 0.698 0.035 1.000 -0.729 -0.570 0.031 0.656 -0.103
goal12 -0.730 -0.376 -0.745 -0.667 -0.464 -0.711 -0.704 -0.519 -0.759 -0.297 -0.729 1.000 0.865 -0.170 -0.666 0.107
goal13 -0.594 -0.322 -0.587 -0.533 -0.351 -0.529 -0.531 -0.389 -0.689 -0.299 -0.570 0.865 1.000 -0.150 -0.493 0.021
goal15 0.039 0.154 0.014 0.007 0.191 0.187 0.039 0.181 0.137 0.118 0.031 -0.170 -0.150 1.000 0.191 -0.108
goal16 0.613 0.430 0.666 0.496 0.384 0.599 0.566 0.462 0.591 0.275 0.656 -0.666 -0.493 0.191 1.000 -0.099
pop_affected -0.066 -0.083 -0.058 -0.030 -0.152 -0.106 -0.071 -0.099 0.000 0.074 -0.103 0.107 0.021 -0.108 -0.099 1.000
Code

# Melt the correlation matrix for ggplot2
cor_melted <- as.data.frame(as.table(correlation_matrix_Conflicts_Pop_Affected))
names(cor_melted) <- c("Variable1", "Variable2", "Correlation")

# Create the heatmap
ggplot(data = cor_melted, aes(Variable1, Variable2, fill = Correlation)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, size = 8, hjust = 1),
        axis.text.y = element_text(size = 8)) +
  coord_fixed() +
  labs(x = '', y = '',
       title = 'Correlation between Conflicts Affected Population and the SDG goals')

4 Focus on the evolution of SDG scores over time

** How has the adoption of the SDGs in 2015 influenced the achievement of SDGs?

Code
data_question2 <- read.csv(here("scripts", "data", "data_question24.csv"))
data_question2 <- data_question2 %>% select(-X)

4.1 EDA: General time evolution of SDG socres

First, we look at the evolution of SDG achievement overall score over time by continent and by region and we see that the general evolution of SDG scores around the world is increasing over the years, but very slowly.

Code
data2 <- data_question2 %>% group_by(year, continent) %>%
  mutate(mean_overall_score_by_year=mean(overallscore))

ggplot(data2) +
  geom_line(mapping=aes(x=year, y=mean_overall_score_by_year, color=continent), lwd=0.8) +
  geom_point(mapping=aes(x=year, y=mean_overall_score_by_year, color=continent), lwd=1.5) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean overall SDG achievement score",
       y = "Mean Overall SDG Score",
       x = "Year"
       )

Looking at the continents, we see that Europe is above the others, while Africa is below, but in general, all have increasing overall scores.

Code
data3 <- data_question2 %>% group_by(year, region) %>%
  mutate(mean_overall_score_by_year=mean(overallscore))

ggplot(data3) +
  geom_line(mapping=aes(x=year, y=mean_overall_score_by_year, color=region), lwd=0.8) +
  geom_point(mapping=aes(x=year, y=mean_overall_score_by_year, color=region), lwd=1.5) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean overall SDG achievement score",
       y = "Mean Overall SDG Score",
       x = "Year"
       )+
  theme(legend.position = "bottom")

This view that groups the countries by region gives us precision about the previous information. Indeed, it is Western Europe that is particularly above and Sub-Saharan Africa that is clearly below.

Second, we look at the evolution of SDG achievement scores(16) over time for the whole world and by continent. We notice that all SDGs except from goal 9 (industry innovation and infrastructure) are close to one another in terms of level and growth. Goal 9 starts far below the others in 2000 and growths faster until exceeding 50%. In addition, some goals did not increase their scores much in the last two decades, for example goal 13 (climate action) and goal 12 (responsible consumption and production).

Code
data4 <- data_question2 %>%
  group_by(year) %>%
  summarise(across(starts_with("goal"), mean, na.rm=TRUE)) %>%
  pivot_longer(cols = starts_with("goal"), names_to = "goal", values_to = "mean_value")

color_palette <- c("red", "blue", "green", "orange", "purple", "pink", "lightblue", "gray", "cyan", "magenta", "yellow", "darkgreen", "darkblue", "darkred", "darkgrey", "darkcyan")

ggplot(data = data4) +
  geom_line(mapping = aes(x = year, y = mean_value, color = goal), size = 0.7) +
  geom_point(mapping = aes(x = year, y = mean_value, color = goal), size = 1) +
  scale_color_manual(values = color_palette) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean SDG achievement scores across the world",
       y = "Mean SDG Scores",
       x = "Year"
       ) 

We continue with the graph that distinguishes continents to get more information.

Code
data5 <- data_question2 %>%
  group_by(year, continent) %>%
  summarise(across(starts_with("goal"), mean, na.rm=TRUE)) %>%
  pivot_longer(cols = starts_with("goal"), names_to = "goal", values_to = "mean_value")

ggplot(data = data5) +
  geom_line(mapping = aes(x = year, y = mean_value, color=continent), size = 0.7) +
  scale_color_manual(values = color_palette) +
  scale_y_continuous(limits = c(0, 100)) +
  labs(title = "Evolution of the mean SDG achievement scores by continent",
       y = "Mean SDG Scores",
       x = "Years from 2000 to 2022"
       ) +
  facet_wrap(~ goal, nrow = 4)+
  scale_x_continuous(breaks = NULL)+
  theme_light()

We observe that most of the time, Europe is at the top of the graph and Africa at the bottom, except for goals 12 and 13 that are linked to ecology. Some other information stand out:

  • Americas are far behind the other parts of the world regarding goal 10: reduced inequalities.

  • Africa is far behind the other continents (even if becoming better) for goals 1, 3, 4 and 7.

  • Goal 9 (industry, innovation and infrastructure) show exponential growth for almost all continents.

Third we create an interactive map of the world to be able to navigate from year 2000 to 2022, seeing the level of achievement of the SDGs (overall score) for each country. To open it in your browser, use this R file: interactive_map_1. Here is only a non-interactive world map of the overall SDGs achievement scores, not taking into account the evolution over the years.

Code
library(rnaturalearth)
library(tidyverse)
library(sf)
# Load world map data
world <- ne_countries(scale = "medium", returnclass = "sf")

# Merge data with the world map data
data0 <- merge(world, data_question2, by.x = "iso_a3", by.y = "code", all.x = TRUE)

data0 %>%
  sf::st_transform(crs = "+proj=robin") %>%
  ggplot() +
  geom_sf(color = "lightgrey") +
  geom_sf(aes(fill = overallscore), color = NA) +
  scale_fill_gradientn(
    colors = c("darkred", "orange", "yellow", "darkgreen"),
    values = scales::rescale(c(0, 0.25, 0.5, 1)),
    name = "Overall Score",
    na.value = NA
  ) +
  labs(title = "Mean overall SDG achievement score by country")+
  coord_sf(datum = NA) +
  theme_minimal()

Again, we see that the overall achievement score of the SDGs is increasing and that the countries that have the most red (bad score) are in Africa. However it is also there that it increases more rapidly. Our hypothesis is that when a score is very low, it is easier to make it better than when it becomes very high (around 90%) it may be hard to increase it, because it would mean perfection. In the next section, we will further investigate this idea.

4.2 Analysis: SDG adoption in 2015

We create one new variable per goal that captures the difference in SDG score between the year of the observation and the previous year. This will allow us to see how the countries improve (or not) on SDG scores each year. In addition, preparing for the specific question around 2015, we only keep the years from 2009 to 2022 (7 years before and after 2015).

Code
binary2015 <- data_question2 %>% 
  group_by(code) %>%
  mutate(across(5:21, ~ . - dplyr::lag(.), .names = "diff_{.col}")) %>%
  ungroup()

# Create a new column (binary variable) with value 1 if the year is after 2015 and zero otherwise. 
binary2015 <- binary2015 %>% 
  mutate(after2015 = ifelse(year > 2015, 1, 0)) %>%
  filter(as.numeric(year)>=2009)

We begin by looking at the distribution of the difference in SDG scores from one year to the next (improvement if it is above zero and deterioration if it is below zero).

Code
# histogram of difference in scores between years
unique_years <- unique(binary2015$year)
plot_ly() %>%
  add_trace(
    type = "histogram", 
    data = binary2015, 
    x = ~diff_overallscore[year == 2009],
    marker = list(color = "lightgreen", line = list(color = "black", width = 1))
  ) %>%
  layout(
    title = "Distribution of SDG evolution",
    xaxis = list(title = "Year difference SDG score", range = c(-3, 3)),
    yaxis = list(title = "Frequency", range = c(0, 40)),
    sliders = list(
      list(
        active = 0,
        currentvalue = list(prefix = "Year: "),
        steps = lapply(seq_along(unique_years), function(i) {
          year <- unique_years[i]
          list(
            label = as.character(year),
            method = "restyle",
            args = list(
              list(x = list(binary2015$diff_overallscore[binary2015$year == year]))
            )
          )
        })
      )
    )
  )

We notice that across the years, the distribution stays on the right of the x-axis, which means that there are more improvement than deterioration. If there is deterioration, it is less than one percent per year, except some extreme cases, for instance in 2013, there was almost a 3% decrease in the overall SDG score of one country. It is also rare to see improvements of more than 2% per year. Regarding our specific question, we do not see a major improvement of the distribution after 2015, if it was the case we would see the distribution going more to the right, but except for 2017, there are more and more values centered around zero, which means less score improvements overall.

After having visualized the improvements and declines of SDG overall score for the whole world, we are now interested in the top 5 countries in terms of improvement each year and we see that major improvement often comes from Sub-Saharan Africa countries or Middle East and North Africa. This confirms that more efforts are made in these regions to achieve better scores, but we also know from our previous visualizations that their initial scores are lower. Moreover, we record that the higher improvements are of 3% per year and were mostly achieved before 2015. Indeed, we can tell that in terms of maximum improvements, the adoption of SDGs in 2015 did not have a strong impact. We also notice that 2020 is the year with the smallest best improvements. We keep that in mind for the next question regarding events and specifically COVID.

Code
top_n_values <- 5

# Test with ggpot2
custom_colors <- c("blue", "darkblue", "cyan", "green", "darkgreen", "lightgreen", "lightblue","turquoise", "lightgrey", "darkgrey")

# Get unique regions in the dataset
unique_regions <- unique(binary2015$region)

# Create a color dictionary mapping each region to a specific color
region_colors <- setNames(custom_colors[1:length(unique_regions)], unique_regions)

library(patchwork)

plots <- list()

for (year in unique_years) {
  top_countries <- binary2015[binary2015$year == year, ] %>%
    arrange(desc(year), desc(diff_overallscore)) %>%
    head(n = top_n_values)
  
  plot <- ggplot(data = top_countries, mapping = aes(x = country, y = diff_overallscore, fill = region)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = region_colors) +  # Use the specified colors
    labs(title = paste(year), x = NULL, y = NULL) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size= 6), legend.position = "none", plot.title = element_text(size = 10)) + 
    scale_y_continuous(limits = c(0, 3))
  
  plots[[as.character(year)]] <- plot
}

# Arrange the plots in a 4x4 grid using patchwork
wrap <- wrap_plots(plots, ncol = 5)

wrap + plot_annotation(
  title = 'Best 5 countries in terms of SDG score improvement'
)

Code
# Create a common legend manually
legend_data <- data.frame(region = unique_regions)
legend_plot <- ggplot(legend_data, aes(x = region, fill = region)) +
  geom_bar(position = position_stack(reverse = TRUE)) +
  scale_fill_manual(values = region_colors) +
  labs(title = "Regions") +
  theme_void() +
  theme(
    legend.position = "none",
    axis.text.y = element_text(angle = 0, hjust = 1, size = 18),
    plot.title = element_text(size = 20, face = "bold")
  ) +
  coord_flip()

legend_plot

We continue by looking at the worst 5 countries in terms of decline in SDG overall score each year and we see that the years with the worst declines are those closer to us. Indeed the declines were generally no more than 1%, until 2018, where these became more frequent. We notice that the adoption of SDGs in 2015 may have had a good impact, because during the two years that follow, the worst SDG score declines were low (no more than 1% in 2016 and no more 0.5% in 2017). It was stabilizing, but it was of short duration, because then come the more extreme deteriotations. Interestingly, the regions that had were the worst in terms of decline during the past twelve years were very different, the only pattern appears during the last four years, where most of them are in Latin America and the Caribbean.

Code
plots <- list()

for (year in unique_years) {
  top_countries <- binary2015[binary2015$year == year, ] %>%
    arrange(desc(year), diff_overallscore) %>%
    head(n = top_n_values)
  
  plot <- ggplot(data = top_countries, mapping = aes(x = country, y = diff_overallscore, fill = region)) +
    geom_bar(stat = "identity") +
    scale_fill_manual(values = region_colors) +  # Use the specified colors
    labs(title = paste(year), x = NULL, y = NULL) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1, size=6), legend.position = "none", plot.title = element_text(size = 10)) + 
    scale_y_continuous(limits = c(-3,0))
  
  plots[[as.character(year)]] <- plot
}

# Arrange the plots in a 4x4 grid using patchwork
wrap <- wrap_plots(plots, ncol = 5)

wrap + plot_annotation(
  title = 'Worst 5 countries in terms of SDG score improvement'
)

Code
legend_plot

We move on to the specific SDG scores and look at the 20 best improvements by score. We additionaly differentiate between the improvements than occurred before and after 2015. We want to see which goals get the best improvements and which countries put more effort into it.

Code
# Best improvements
data_long <- binary2015 %>%
  pivot_longer(cols = c(starts_with("diff_goal"), "diff_overallscore"),
               names_to = "goal", values_to = "improvement") %>%
  group_by(goal) %>%
  top_n(20, wt = improvement) %>%
  ungroup()

plot_ly() %>%
  add_trace(
    type = "bar",
    data = data_long,
    x = ~country[after2015 == 1 & goal == "diff_overallscore"],
    y = ~improvement[after2015 == 1 & goal == "diff_overallscore"],
    legendgroup = "after 2015",
    name = "after 2015",
    marker = list(color = "blue", size = 10),
    showlegend = TRUE
  ) %>%
  add_trace(
    type = "bar",
    x = ~country[after2015 == 0 & goal == "diff_overallscore"],
    y = ~improvement[after2015 == 0 & goal == "diff_overallscore"],
    legendgroup = "before 2015",
    name = "before 2015",
    marker = list(color = "red", size = 10),
    showlegend = TRUE
  ) %>%
  layout(
    title = paste("Top 20 countries per SDG Score evolution"),
    yaxis = list(title = "Year difference SDG score", range = c(0, 50)),
    xaxis = list(title = "Countries", categoryorder = "total ascending"),
    barmode = "stack",
    updatemenus = list(
      list(
        buttons = list(
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_overallscore"],
                  ~improvement[after2015 == 0 & goal == "diff_overallscore"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_overallscore"],
                  ~country[after2015 == 0 & goal == "diff_overallscore"]
                )
              )
            ),
            label = "Overall score",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal1"],
                  ~improvement[after2015 == 0 & goal == "diff_goal1"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal1"],
                  ~country[after2015 == 0 & goal == "diff_goal1"]
                )
              )
            ),
            label = "Goal 1",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal2"],
                  ~improvement[after2015 == 0 & goal == "diff_goal2"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal2"],
                  ~country[after2015 == 0 & goal == "diff_goal2"]
                )
              )
            ),
            label = "Goal 2",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal3"],
                  ~improvement[after2015 == 0 & goal == "diff_goal3"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal3"],
                  ~country[after2015 == 0 & goal == "diff_goal3"]
                )
              )
            ),
            label = "Goal 3",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal4"],
                  ~improvement[after2015 == 0 & goal == "diff_goal4"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal4"],
                  ~country[after2015 == 0 & goal == "diff_goal4"]
                )
              )
            ),
            label = "Goal 4",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal5"],
                  ~improvement[after2015 == 0 & goal == "diff_goal5"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal5"],
                  ~country[after2015 == 0 & goal == "diff_goal5"]
                )
              )
            ),
            label = "Goal 5",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal6"],
                  ~improvement[after2015 == 0 & goal == "diff_goal6"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal6"],
                  ~country[after2015 == 0 & goal == "diff_goal6"]
                )
              )
            ),
            label = "Goal 6",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal7"],
                  ~improvement[after2015 == 0 & goal == "diff_goal7"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal7"],
                  ~country[after2015 == 0 & goal == "diff_goal7"]
                )
              )
            ),
            label = "Goal 7",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal8"],
                  ~improvement[after2015 == 0 & goal == "diff_goal8"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal8"],
                  ~country[after2015 == 0 & goal == "diff_goal8"]
                )
              )
            ),
            label = "Goal 8",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal9"],
                  ~improvement[after2015 == 0 & goal == "diff_goal9"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal9"],
                  ~country[after2015 == 0 & goal == "diff_goal9"]
                )
              )
            ),
            label = "Goal 9",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal10"],
                  ~improvement[after2015 == 0 & goal == "diff_goal10"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal10"],
                  ~country[after2015 == 0 & goal == "diff_goal10"]
                )
              )
            ),
            label = "Goal 10",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal11"],
                  ~improvement[after2015 == 0 & goal == "diff_goal11"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal11"],
                  ~country[after2015 == 0 & goal == "diff_goal11"]
                )
              )
            ),
            label = "Goal 11",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal12"],
                  ~improvement[after2015 == 0 & goal == "diff_goal12"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal12"],
                  ~country[after2015 == 0 & goal == "diff_goal12"]
                )
              )
            ),
            label = "Goal 12",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal13"],
                  ~improvement[after2015 == 0 & goal == "diff_goal13"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal13"],
                  ~country[after2015 == 0 & goal == "diff_goal13"]
                )
              )
            ),
            label = "Goal 13",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal15"],
                  ~improvement[after2015 == 0 & goal == "diff_goal15"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal15"],
                  ~country[after2015 == 0 & goal == "diff_goal15"]
                )
              )
            ),
            label = "Goal 15",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal16"],
                  ~improvement[after2015 == 0 & goal == "diff_goal16"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal16"],
                  ~country[after2015 == 0 & goal == "diff_goal16"]
                )
              )
            ),
            label = "Goal 16",
            method = "restyle"
          ),
          list(
            args = list(
              list(
                y = list(
                  ~improvement[after2015 == 1 & goal == "diff_goal17"],
                  ~improvement[after2015 == 0 & goal == "diff_goal17"]
                ),
                x = list(
                  ~country[after2015 == 1 & goal == "diff_goal17"],
                  ~country[after2015 == 0 & goal == "diff_goal17"]
                )
              )
            ),
            label = "Goal 17",
            method = "restyle"
          )
        )
      )
    )
  )

We notice various patterns, among them:

  • Goals 2 (zero hunger), 3 (good health and well-being), 6 (clean water and sanitation), 8 (decent work and economic growth), 12 (responsible consumption and production), 16 (peace, justice and strong institutions) have very low improvements per year. Indeed, even the best ones are below 10%.

  • Goal 10 (reduced inequalities) has the best improvements, all 20 best improvements are above 20% and it goes up to 45%.

  • Some goals clearly had most of their best improvements before 2015: goals 3 (good health and well-being), 5 (gender equality), 6 (clean water and sanitation), 7 (affordable and clean energy).

  • Some goals clearly had most of their best improvements after 2015: goals 8 (decent work and economic growth), 12 (responsible consumption and production).

Regarding the impact of the adoption of SDGs in 2015, we can not tell that it had a positive impact, because there are not more big improvements after 2015 than before, even a little bit less. In addition, the most impressive improvements mostly occurred before 2015. These conclusions are supported by the next graph: we fit to different regression lines (before and after 2015) to see if there is a jump after the adoption and if the the SDG scores increase faster.

Code
# Graphs to show the jump (or not) in 2015

# Filter data
data_after_2015 <- filter(binary2015, as.numeric(year) >= 2015)
data_before_2016 <- filter(binary2015, as.numeric(year) <= 2015)

plotly::plot_ly() %>%
  plotly::add_trace(data = data_after_2015, x = ~year, y = ~fitted(lm(overallscore ~ year, data = data_after_2015)), type = 'scatter', mode = 'lines', line = list(color = 'blue'), name = "After 2015") %>%
  plotly::add_trace(data = data_before_2016, x = ~year, y = ~fitted(lm(overallscore ~ year, data = data_before_2016)), type = 'scatter', mode = 'lines', line = list(color = 'red'), name = "Before 2015") %>%
  plotly::layout(title = "Different patterns across SDGs before and after 2015",
         xaxis = list(title = "Year"),
         yaxis = list(title = "SDG achievement score", range = c(30, 85)),
         shapes = list(
           list(
             type = 'line',
             x0 = 2015,
             x1 = 2015,
             y0 = 0,
             y1 = 1,
             yref = 'paper',
             line = list(color = 'grey', width = 2, dash = 'dot')
           )
         ),
         updatemenus = list(
           list(
             buttons = list(
               list(
                 args = list("y", list(
                   ~fitted(lm(overallscore ~ year, data = data_after_2015)),
                   ~fitted(lm(overallscore ~ year, data = data_before_2016))
                 )),
                 label = "Overall score",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal1 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal1 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 1: \nno poverty",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal2 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal2 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 2: \nzero hunger",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal3 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal3 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 3: good health \nand well-being",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal4 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal4 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 4: \nquality education",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal5 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal5 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 5: \ngender equality",
                 method = "restyle"
               ), 
               list(
                 args = list("y", list(
                   ~fitted(lm(goal6 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal6 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 6: clean water \nand sanitation",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal7 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal7 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 7: affordable \nand clean energy",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal8 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal8 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 8: decent work \nand economic growth",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal9 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal9 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 9: industry, innovation \nand infrastructure",
                 method = "restyle"
               ), 
               list(
                 args = list("y", list(
                   ~fitted(lm(goal10 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal10 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 10: \nreduced inequalities",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal11 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal11 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 11: sustainable \ncities and communities",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal12 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal12 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 12: responsible \nconsumption and production",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal13 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal13 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 13: \nclimate action",
                 method = "restyle"
               ), 
               list(
                 args = list("y", list(
                   ~fitted(lm(goal15 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal15 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 15: \nlife on earth",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal16 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal16 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 16: peace, justice \nand strong institutions",
                 method = "restyle"
               ),
               list(
                 args = list("y", list(
                   ~fitted(lm(goal17 ~ year, data = data_after_2015)),
                   ~fitted(lm(goal17 ~ year, data = data_before_2016))
                 )),
                 label = "Goal 17: partnerships \nfor the goals",
                 method = "restyle"
               )
             )
           )
         )
  )

Simple OLS regression on the difference between years of SDG scores:

Code
# Simple linear regression of the overall score on the difference in SDG scores variables "after2015"
library(huxtable)
reg2.1 <- lm(diff_overallscore ~ after2015, data=binary2015)
reg2.1.1 <- lm(diff_goal1 ~ after2015, data=binary2015)
reg2.1.2 <- lm(diff_goal2 ~ after2015, data=binary2015)
reg2.1.3 <- lm(diff_goal3 ~ after2015, data=binary2015)
reg2.1.4 <- lm(diff_goal4 ~ after2015, data=binary2015)
reg2.1.5 <- lm(diff_goal5 ~ after2015, data=binary2015)
reg2.1.6 <- lm(diff_goal6 ~ after2015, data=binary2015)
reg2.1.7 <- lm(diff_goal7 ~ after2015, data=binary2015)
reg2.1.8 <- lm(diff_goal8 ~ after2015, data=binary2015)
reg2.1.9 <- lm(diff_goal9 ~ after2015, data=binary2015)
reg2.1.10 <- lm(diff_goal10 ~ after2015, data=binary2015)
reg2.1.11 <- lm(diff_goal11 ~ after2015, data=binary2015)
reg2.1.12 <- lm(diff_goal12 ~ after2015, data=binary2015)
reg2.1.13 <- lm(diff_goal13 ~ after2015, data=binary2015)
reg2.1.15 <- lm(diff_goal15 ~ after2015, data=binary2015)
reg2.1.16 <- lm(diff_goal16 ~ after2015, data=binary2015)
reg2.1.17 <- lm(diff_goal17 ~ after2015, data=binary2015)

models_list1 <- list("Overall score"=reg2.1, "Goal 1"=reg2.1.1, "Goal 2"=reg2.1.2, "Goal 3"= reg2.1.3, "Goal 4"=reg2.1.4, "Goal 5"=reg2.1.5, "Goal 6"= reg2.1.6, "Goal 7"=reg2.1.7, "Goal 8"=reg2.1.8, "Goal 9"=reg2.1.9, "Goal 10"=reg2.1.10, "Goal 11" = reg2.1.11, "Goal 12"=reg2.1.12, "Goal 13"=reg2.1.13, "Goal 15" =reg2.1.15, "Goal 16"=reg2.1.16, "Goal 17"=reg2.1.17)

huxreg(models_list1[1:9])
Overall score Goal 1 Goal 2 Goal 3 Goal 4 Goal 5 Goal 6 Goal 7 Goal 8
(Intercept) 0.398 *** 0.535 *** 0.259 *** 0.764 *** 0.582 *** 0.863 *** 0.276 *** 0.510 *** 0.072 *  
(0.017)    (0.065)    (0.051)    (0.035)    (0.088)    (0.055)    (0.013)    (0.048)    (0.034)   
after2015 -0.073 **  -0.194 *   -0.116     -0.423 *** -0.275 *   -0.315 *** -0.127 *** -0.259 *** 0.203 ***
(0.024)    (0.091)    (0.072)    (0.050)    (0.125)    (0.077)    (0.018)    (0.068)    (0.048)   
N 2170         2002         2170         2170         2170         2170         2170         2170         2170        
R2 0.004     0.002     0.001     0.033     0.002     0.008     0.021     0.007     0.008    
logLik -1859.183     -4268.647     -4209.838     -3388.407     -5388.474     -4356.774     -1236.908     -4082.277     -3307.742    
AIC 3724.367     8543.295     8425.676     6782.813     10782.949     8719.549     2479.816     8170.554     6621.484    
*** p < 0.001; ** p < 0.01; * p < 0.05.
Code
huxreg(models_list1[10:17])
Goal 9 Goal 10 Goal 11 Goal 12 Goal 13 Goal 15 Goal 16 Goal 17
(Intercept) 1.496 *** 0.491 *** 0.244 ** 0.053 * 0.094 * 0.207 *** 0.063     0.107    
(0.068)    (0.132)    (0.082)   (0.024)  (0.041)  (0.044)    (0.048)    (0.062)   
after2015 -0.010     -0.130     -0.044    -0.002   -0.014   -0.225 *** -0.260 *** 0.709 ***
(0.096)    (0.187)    (0.116)   (0.034)  (0.057)  (0.063)    (0.068)    (0.088)   
N 2170         2002         2170        2170       2170       2170         2170         2170        
R2 0.000     0.000     0.000    0.000   0.000   0.006     0.007     0.029    
logLik -4834.229     -5702.552     -5233.598    -2542.387   -3711.381   -3904.359     -4077.124     -4625.628    
AIC 9674.457     11411.104     10473.196    5090.775   7428.762   7814.718     8160.248     9257.256    
*** p < 0.001; ** p < 0.01; * p < 0.05.

DiD using panel data:

Code
# Create a panel data object
panel_data <- plm::pdata.frame(binary2015, index = c("country", "year"))

# Run the difference-in-differences model to take into account the general evolution over the years
reg2.2 <- plm::plm(diff_overallscore ~ after2015 + year + after2015:year, 
                 data = panel_data,
                 model = "within")
reg2.2.1 <- plm::plm(diff_goal1 ~ after2015 + year + after2015:year, 
              data = panel_data,
              model = "within")
reg2.2.2 <- plm::plm(diff_goal2 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.3 <- plm::plm(diff_goal3 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.4 <- plm::plm(diff_goal4 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.5 <- plm::plm(diff_goal5 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.6 <- plm::plm(diff_goal6 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.7 <- plm::plm(diff_goal7 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.8 <- plm::plm(diff_goal8 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.9 <- plm::plm(diff_goal9 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.10 <- plm::plm(diff_goal10 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.11 <- plm::plm(diff_goal11 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.12 <- plm::plm(diff_goal12 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.13 <- plm::plm(diff_goal13 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.15 <- plm::plm(diff_goal15 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.16 <- plm::plm(diff_goal16 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")
reg2.2.17 <- plm::plm(diff_goal17 ~ after2015 + year + after2015:year, 
                data = panel_data,
                model = "within")

# Create a list of your regression models
models_list2 <- list("Overall score"=reg2.2, "Goal 1"=reg2.2.1, "Goal 2"=reg2.2.2, "Goal 3"= reg2.2.3, "Goal 4"=reg2.2.4, "Goal 5"=reg2.2.5, "Goal 6"= reg2.2.6, "Goal 7"=reg2.2.7, "Goal 8"=reg2.2.8, "Goal 9"=reg2.2.9, "Goal 10"=reg2.2.10, "Goal 11" = reg2.2.11,"Goal 12"=reg2.2.12, "Goal 13"=reg2.2.13, "Goal 15" =reg2.2.15, "Goal 16"=reg2.2.16, "Goal 17"=reg2.2.17)

huxreg(models_list2[1:9])
Overall score Goal 1 Goal 2 Goal 3 Goal 4 Goal 5 Goal 6 Goal 7 Goal 8
after2015 -0.298 *** 1.036 *** -0.449 *   -0.730 *** -0.390   -0.382     -0.306 *** -0.645 *** 0.788 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2010 0.021     0.000     -0.523 **  1.019 *** 0.359   0.023     -0.057     -0.164     0.303 ** 
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2011 -0.045     0.878 *** -0.089     -0.108     0.424   -0.056     -0.023     -0.355 *   0.446 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2012 0.087     0.762 *** -0.543 **  -0.179     0.379   0.288     -0.062     0.017     0.450 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2013 -0.005     1.015 *** -0.190     -0.309 **  -0.045   1.259 *** -0.020     -0.057     0.417 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2014 0.157 *   0.508 *   0.086     -0.295 *   0.730 * 0.101     -0.052     -0.447 *   1.452 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2015 0.001     0.580 **  -0.735 *** 0.325 **  0.038   0.213     -0.002     0.061     0.536 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2016 0.207 *** -0.710 **  -0.128     0.432 *** 0.447   0.314     0.233 *** 0.370 *   -0.252 *  
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2017 0.589 *** -0.558 *   0.021     0.900 *** 0.572   0.497 *   0.263 *** 0.470 **  0.821 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2018 0.254 *** -0.594 **  0.210     0.501 *** 0.799 * 0.264     0.208 *** 0.351 *   -0.187    
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2019 0.337 *** -0.831 *** 0.295     0.868 *** 0.474   0.614 **  0.159 *** 0.270     -0.262 *  
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2020 0.127 *   -1.874 *** -0.045     -0.135     0.388   0.167     0.179 *** 0.296     -1.159 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
year2021 0.274 *** -0.294     -0.015     0.032     0.010   0.441 *   -0.000     0.000     0.545 ***
(0.062)    (0.218)    (0.194)    (0.119)    (0.328)  (0.203)    (0.041)    (0.179)    (0.113)   
N 2170         2002         2170         2170         2170       2170         2170         2170         2170        
R2 0.060     0.068     0.020     0.175     0.011   0.043     0.070     0.020     0.228    
*** p < 0.001; ** p < 0.01; * p < 0.05.
Code
huxreg(models_list2[10:17])
Goal 9 Goal 10 Goal 11 Goal 12 Goal 13 Goal 15 Goal 16 Goal 17
after2015 -0.654 **  -1.080 * -0.394     -0.292 *** -0.395 **  -0.405 *   -1.150 *** 0.032    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2010 0.174     -0.483   0.127     -0.319 *** -0.668 *** 0.246     0.617 *** -0.395    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2011 0.460 *   -0.798   -0.783 *   -0.422 *** -0.688 *** -0.114     0.486 **  -0.234    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2012 1.231 *** -1.010 * 0.218     -0.178 *   -0.127     -0.188     0.230     -0.105    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2013 0.804 *** -0.764   -0.474     -0.369 *** -0.366 *   -0.060     -0.475 **  -0.255    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2014 0.933 *** -0.794   0.618 *   -0.254 **  -0.244     -0.185     0.852 *** -0.205    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2015 0.725 **  -0.276   -0.619 *   -0.128     -0.014     -0.250     0.493 **  -0.837 ***
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2016 0.596 *   0.155   1.254 *** -0.042     0.299 *   0.203     0.964 *** -0.561 *  
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2017 3.605 *** 1.026 * 0.409     -0.070     -0.021     0.285     1.529 *** 0.479 *  
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2018 1.037 *** 0.443   0.128     -0.040     -0.169     -0.743 *** 1.901 *** 0.494 *  
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2019 1.208 *** 0.631   0.063     -0.072     0.164     0.201     1.433 *** -0.035    
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2020 1.235 *** 0.283   -0.135     0.009     0.452 **  0.096     1.503 *** 1.153 ***
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
year2021 1.148 *** -0.008   -0.181     0.568 *** -0.164     0.671 *** 1.099 *** 1.179 ***
(0.231)    (0.500)  (0.309)    (0.087)    (0.151)    (0.163)    (0.173)    (0.229)   
N 2170         2002       2170         2170         2170         2170         2170         2170        
R2 0.140     0.007   0.031     0.056     0.035     0.052     0.109     0.082    
*** p < 0.001; ** p < 0.01; * p < 0.05.

5 Analysis

5.1 Answers to the research questions

5.1.1 Focus on relationship between SDGs

How are the different SDGs linked? (We want to see if some SDGs are linked in the fact that a high score on one implies a high score on the other, and thus if we can make groups of SDGs that are comparable in that way).

5.2 Focus on relationship between SDGs

Let’s analyse our relationship between the SDGs. For that, we’ll import our dataset to examine the interconnections among the Sustainable Development Goals (SDGs). After importing, we’ll focus specifically on the columns representing the goals of interest. To provide a comprehensive analysis, we will construct a correlation matrix, highlighting only those goals where the correlation coefficient is either greater than 0.5 (indicating a strong positive relationship) or less than -0.5 (signifying a strong negative relationship). This approach will enable us to identify and analyze the most significant relationships between the selected SDGs.

Code
data_4 <- read.csv(here::here("scripts", "data", "data_question24.csv"))
goals_data_4_cl <- na.omit(data_4, cols=c("goal1", "goal10"))
goals_data_4_cl <- goals_data_4_cl[, grepl("goal", names(goals_data_4_cl))]

Given that our variables do not follow a normal distribution, employing the Pearson correlation method is not suitable in our analysis. We attempted to normalize the data through logarithmic and square root transformations, but these adjustments were insufficiently effective. Consequently, we will resort to computing the Spearman correlation. While not ideal, this method does not necessitate the normal distribution of our data. In our analysis, particularly for the heatmap visualization, we will focus on correlations that exceed the threshold of r threshold_heatmap or fall below -r threshold_heatmap. This selective approach will enhance the readability and interpretability of the heatmap.

Code
spearman_corr_4_cl <- cor(goals_data_4_cl, method = "spearman", use = "everything")
spearman_corr_4_cl[abs(spearman_corr_4_cl) < threashold_heatmap] <- NA

We can then plot the Heatmap of the Spearman correlation using the ggplot2 package.

Code
# Melting the data
melted_corr_4 <- melt(spearman_corr_4_cl, na.rm = TRUE)

# Creating the heatmap
ggplot(data = melted_corr_4, aes(x = Var1, y = Var2, fill = value)) +
    geom_tile() +
    geom_text(aes(label = sprintf("%.2f", value)), vjust = 0.5, size=2.5) + # Adding text
    scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                         midpoint = 0, limit = c(-1,1), space = "Lab", 
                         name="Spearman\nCorrelation",
                         na.value = "grey") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    labs(title = "Heatmap of Spearman Correlations for Goals", 
         x = "", y = "")

It is evident that the Sustainable Development Goals (SDGs) are intricately interconnected. However, certain goals appear to be less interrelated compared to others. Specifically, SDG 1 (No Poverty) and SDG 10 (Reduced Inequalities) demonstrate a weaker correlation with the rest of the goals. Similarly, Goal 15 (Life on Land) also exhibits a lesser degree of interconnection with the other SDGs.

Code
# Selecting only numeric columns, assuming they are named as 'goal1', 'goal2', etc.
goals_data <- goals_data_4_cl[, grep('goal', names(goals_data_4_cl))]
goals_data_scaled <- scale(goals_data) # Scaling the data
pca_result <- prcomp(goals_data_scaled) # Running PCA

# Summary of PCA - shows variance explained by each principal component
# summary(pca_result)

# Plotting Scree plot to visualize the importance of each principal component
fviz_eig(pca_result,
         addlabels = TRUE) +
  theme_minimal()

# Plotting Biplot to visualize the two main PCs
fviz_pca_biplot(pca_result,
                label="var",
                col.var="dodgerblue3",
                geom="point",
                pointsize = 0.1,
                labelsize = 5) +
  theme_minimal()

In our EDA on the focus on the influence of the factors over the SDG scores, we had made a correlation matrix heatmap that took into account every variable of our dataset. Here, we tried to zoom on certain parts of the heatmap. We have decided to add on our graphs the correlations between variables when our pvalue was significant (alpha = 0.05). The grey zones are concerning our non-significant pvalues.

Let’s see first the correlation matrix heatmap regarding our SDG goals and all our variables different than our SDG goals.

Code
corr_matrix <- cor(data_question1[7:40])
p_matrix2 <- matrix(nrow = ncol(data_question1[7:40]), ncol = ncol(data_question1[7:40]))
for (i in 1:ncol(data_question1[7:40])) {
  for (j in 1:ncol(data_question1[7:40])) {
    test_result <- cor.test(data_question1[7:40][, i], data_question1[7:40][, j])
    p_matrix2[i, j] <- test_result$p.value
  }
}

#Switch population at the end of heatmap

corr_matrix[which(p_matrix2 > 0.05)] <- NA
melted_corr_matrix_GVar <- melt(corr_matrix[19:34,1:18])
ggplot(melted_corr_matrix_GVar, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = ifelse(!is.na(value), sprintf("%.2f", value), '')),
            color = "black", size = 2) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 45, hjust = 1)) +
  labs(x = 'Goals', y = 'Goals',
       title = 'Correlations Heatmap between goals and our other variables')

As we can see, our SDG goals 12 & 13 (responsible consumption & production and climate action) are negatively correlated with most of our variables, as is the economic freedom government variable to our SDG goals. In that sens, we could understand it as having a higher Human Freedom Index Score would influence more negatively the SDG scores of these two goals, i.e. the more people in a country can access and afford civil justice, the more it impact negatively the score of these two SDG goals.

Nevertheless, goals 12 & 13 and ef_government are positively correlated together. In addition, some variables such as internet_usage, pf_law or ef_legal are strongely correlated with most of our SDG goals. This is mostly due to the large scope englobed in these variables. That makes them influence various sectors of our economies and thus, mostly impacting all our SDG goals.

Now let’s zoom on the correlations between all our variables except our SDG goals: ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-301_5234ec6450c97b7fe39ebe59fb723595’}

Code
melted_corr_matrix_Var <- melt(corr_matrix[19:34,19:34])
ggplot(melted_corr_matrix_Var, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = ifelse(!is.na(value), sprintf("%.2f", value), '')),
            color = "black", size = 1.7) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white",
                       midpoint = 0, limit = c(-1, 1), space = "Lab",
                       name = "Pearson\nCorrelation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 45, hjust = 1)) +
  labs(x = 'Goals', y = 'Goals',
       title = 'Correlations Heatmap between other variables than SDG goals')

::: We have noticed that we had high multicolinearity in on regression. Therefore, before to compute them, let’s try to get rid of on of the two variables having at least |0.8| of correlation. ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-302_49883eb50ba0217c5c63d1b2ce101ab1’}

Code
correlation_overall_matrix <- cor(Correlation_overall, use = "everything")
high_cor_pairs <- which(abs(correlation_overall_matrix) >= 0.8, arr.ind = TRUE)

# Displaying the results
for (i in 1:nrow(high_cor_pairs)) {
  row <- high_cor_pairs[i, "row"]
  col <- high_cor_pairs[i, "col"]
  
  # Avoiding duplicate pairs and diagonal elements
  if (row < col) {
    cat(sprintf("Variables: %s and %s, Correlation: %f\n", 
                names(Correlation_overall)[row], names(Correlation_overall)[col], correlation_overall_matrix[row, col]))
  }
}
#> Variables: overallscore and goal1, Correlation: 0.890397
#> Variables: overallscore and goal3, Correlation: 0.942610
#> Variables: goal1 and goal3, Correlation: 0.894344
#> Variables: overallscore and goal4, Correlation: 0.872865
#> Variables: goal1 and goal4, Correlation: 0.822114
#> Variables: goal3 and goal4, Correlation: 0.851160
#> Variables: overallscore and goal6, Correlation: 0.904641
#> Variables: goal3 and goal6, Correlation: 0.874924
#> Variables: overallscore and goal7, Correlation: 0.901153
#> Variables: goal1 and goal7, Correlation: 0.867157
#> Variables: goal3 and goal7, Correlation: 0.879306
#> Variables: goal4 and goal7, Correlation: 0.822145
#> Variables: goal6 and goal7, Correlation: 0.829524
#> Variables: overallscore and goal9, Correlation: 0.834913
#> Variables: goal3 and goal9, Correlation: 0.811679
#> Variables: overallscore and goal11, Correlation: 0.885962
#> Variables: goal3 and goal11, Correlation: 0.873018
#> Variables: goal4 and goal11, Correlation: 0.831999
#> Variables: goal6 and goal11, Correlation: 0.823253
#> Variables: goal7 and goal11, Correlation: 0.841304
#> Variables: goal9 and goal12, Correlation: -0.836629
#> Variables: goal12 and goal13, Correlation: 0.887101
#> Variables: overallscore and goal16, Correlation: 0.814057
#> Variables: goal12 and goal16, Correlation: -0.818053
#> Variables: goal9 and GDPpercapita, Correlation: 0.811971
#> Variables: goal12 and GDPpercapita, Correlation: -0.848970
#> Variables: overallscore and internet_usage, Correlation: 0.804896
#> Variables: goal9 and internet_usage, Correlation: 0.891350
#> Variables: goal12 and pf_law, Correlation: -0.850543
#> Variables: goal16 and pf_law, Correlation: 0.841794
#> Variables: pf_religion and pf_assembly, Correlation: 0.845992
#> Variables: pf_assembly and pf_expression, Correlation: 0.888191
#> Variables: goal9 and ef_legal, Correlation: 0.829291
#> Variables: goal12 and ef_legal, Correlation: -0.837627
#> Variables: goal16 and ef_legal, Correlation: 0.839944
#> Variables: pf_law and ef_legal, Correlation: 0.852443

# List of high-correlation pairs
correlation_pairs <- list(
  c("overallscore", "goal1"), c("overallscore", "goal3"), c("goal1", "goal3"),
  c("overallscore", "goal4"), c("goal1", "goal4"), c("goal3", "goal4"),
  c("overallscore", "goal6"), c("goal3", "goal6"),
  c("overallscore", "goal7"), c("goal1", "goal7"), c("goal3", "goal7"), c("goal4", "goal7"), c("goal6", "goal7"),
  c("overallscore", "goal9"), c("goal3", "goal9"),
  c("overallscore", "goal11"), c("goal3", "goal11"), c("goal4", "goal11"), c("goal6", "goal11"), c("goal7", "goal11"),
  c("goal9", "goal12"), c("goal12", "goal13"),
  c("overallscore", "goal16"), c("goal12", "goal16"),
  c("goal9", "GDPpercapita"), c("goal12", "GDPpercapita"),
  c("overallscore", "internet_usage"), c("goal9", "internet_usage"),
  c("goal12", "pf_law"), c("goal16", "pf_law"),
  c("pf_religion", "pf_assembly"), c("pf_assembly", "pf_expression"),
  c("goal9", "ef_legal"), c("goal12", "ef_legal"), c("goal16", "ef_legal"), c("pf_law", "ef_legal")
)

# Flatten the list and count the frequency of each variable
flattened_list <- unlist(correlation_pairs)
frequency_count <- table(flattened_list)
variables_to_remove <- c()

for (pair in correlation_pairs) {
  # Select the variable that appears more frequently for removal
  if (frequency_count[pair[1]] > frequency_count[pair[2]]) {
    variables_to_remove <- c(variables_to_remove, pair[1])
  } else if (frequency_count[pair[1]] < frequency_count[pair[2]]) {
    variables_to_remove <- c(variables_to_remove, pair[2])
  } else {
    # If both appear equally, arbitrarily choose one to remove
    variables_to_remove <- c(variables_to_remove, pair[1])
  }
}

variables_to_remove <- unique(variables_to_remove)
variables_to_remove <- sort(variables_to_remove)
print(variables_to_remove) 
#>  [1] "ef_legal"     "goal11"       "goal12"       "goal16"      
#>  [5] "goal3"        "goal4"        "goal7"        "goal9"       
#>  [9] "overallscore" "pf_assembly"

::: Therefore, we will not take into account the variables “ef_legal” “goal11” “goal12” “goal16” “goal3” “goal4” “goal7” “goal9” “overallscore” “pf_assembly” in our regressions, for multicollinearity purpose.

Now, let’s compute the regressions without these variables. ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-303_2a269031b5a383a83f13ea0b0e01e96c’}

Code
reg_goal1_all_new <- lm(goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal2_all_new <- lm(goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal3_all_new <- lm(goal3 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal4_all_new <- lm(goal4 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal6_all_new <- lm(goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal7_all_new <- lm(goal7 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal8_all_new <- lm(goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal9_all_new <- lm(goal9 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal10_all_new <- lm(goal10 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal11_all_new <- lm(goal11 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal12_all_new <- lm(goal12 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal13 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal13_all_new <- lm(goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal15_all_new <- lm(goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal13 + goal15 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
reg_goal17_all_new <- lm(goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_law + pf_security + pf_movement + pf_religion + pf_expression + pf_identity + ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)

::: The problem is that even by getting rid of the previous variables, there still might be multicollinearity. Therefore, we need to analyse the vif for each regression and adapt the model in consequence. ::: {.cell layout-align=“center” hash=‘report_cache/html/unnamed-chunk-304_58559eb8209b09d04d2cb4eaff2c120d’}

Code
#for reg1
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal1_all_new, scope=list(lower=nullmod, upper=reg_goal1_all_new), direction="backward") 
#> Start:  AIC=13300
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_expression                  1         6 502174 13298
#> - pf_security                    1        29 502197 13298
#> - population                     1        75 502243 13299
#> - pf_identity                    1       231 502399 13299
#> - goal2                          1       282 502450 13300
#> <none>                                       502168 13300
#> - pf_law                         1       710 502878 13302
#> - ef_regulation                  1      1437 503605 13305
#> - ef_money                       1      2918 505086 13313
#> - pf_movement                    1      3321 505489 13315
#> - MilitaryExpenditurePercentGDP  1      8654 510823 13341
#> - goal5                          1      9242 511410 13344
#> - goal8                          1      9267 511435 13344
#> - ef_trade                       1     13538 515707 13365
#> - goal13                         1     17048 519216 13382
#> - GDPpercapita                   1     17962 520131 13386
#> - internet_usage                 1     19075 521243 13391
#> - goal15                         1     22916 525084 13410
#> - goal10                         1     24623 526791 13418
#> - goal17                         1     27596 529764 13432
#> - unemployment.rate              1     28600 530768 13437
#> - pf_religion                    1     32572 534740 13455
#> - ef_government                  1     42521 544689 13501
#> - goal6                          1    134385 636553 13891
#> 
#> Step:  AIC=13298
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_security                    1        28 502202 13296
#> - population                     1        93 502267 13297
#> - pf_identity                    1       254 502428 13298
#> - goal2                          1       284 502458 13298
#> <none>                                       502174 13298
#> - pf_law                         1       714 502888 13300
#> - ef_regulation                  1      1431 503605 13303
#> - ef_money                       1      2940 505114 13311
#> - pf_movement                    1      3462 505636 13314
#> - MilitaryExpenditurePercentGDP  1      8872 511046 13340
#> - goal5                          1      9238 511412 13342
#> - goal8                          1      9557 511731 13343
#> - ef_trade                       1     13566 515740 13363
#> - goal13                         1     17042 519216 13380
#> - GDPpercapita                   1     18180 520354 13385
#> - internet_usage                 1     19677 521851 13392
#> - goal15                         1     23218 525392 13409
#> - goal10                         1     24693 526867 13416
#> - goal17                         1     27953 530127 13432
#> - unemployment.rate              1     28638 530812 13435
#> - ef_government                  1     42517 544691 13499
#> - pf_religion                    1     43879 546053 13506
#> - goal6                          1    135324 637498 13893
#> 
#> Step:  AIC=13296
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - population                     1        95 502297 13295
#> - pf_identity                    1       246 502448 13296
#> - goal2                          1       333 502535 13296
#> <none>                                       502202 13296
#> - pf_law                         1       820 503022 13299
#> - ef_regulation                  1      1438 503640 13302
#> - ef_money                       1      2981 505183 13309
#> - pf_movement                    1      3756 505958 13313
#> - MilitaryExpenditurePercentGDP  1      8892 511093 13338
#> - goal5                          1      9621 511823 13342
#> - goal8                          1      9792 511994 13343
#> - ef_trade                       1     13690 515892 13362
#> - goal13                         1     17019 519221 13378
#> - GDPpercapita                   1     18304 520505 13384
#> - internet_usage                 1     19991 522193 13392
#> - goal15                         1     23625 525827 13409
#> - goal10                         1     27378 529580 13427
#> - goal17                         1     27935 530137 13430
#> - unemployment.rate              1     28750 530951 13434
#> - ef_government                  1     43042 545244 13500
#> - pf_religion                    1     44913 547115 13509
#> - goal6                          1    136033 638234 13893
#> 
#> Step:  AIC=13295
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_identity                    1       203 502500 13294
#> - goal2                          1       291 502588 13294
#> <none>                                       502297 13295
#> - pf_law                         1       812 503109 13297
#> - ef_regulation                  1      1453 503750 13300
#> - ef_money                       1      3064 505361 13308
#> - pf_movement                    1      3734 506031 13311
#> - MilitaryExpenditurePercentGDP  1      8837 511134 13337
#> - goal5                          1      9605 511902 13340
#> - goal8                          1      9752 512049 13341
#> - ef_trade                       1     14107 516404 13362
#> - goal13                         1     17076 519373 13376
#> - GDPpercapita                   1     18485 520782 13383
#> - internet_usage                 1     20051 522348 13391
#> - goal15                         1     23696 525993 13408
#> - unemployment.rate              1     28659 530956 13432
#> - goal10                         1     28780 531077 13432
#> - goal17                         1     29163 531460 13434
#> - ef_government                  1     43208 545505 13499
#> - pf_religion                    1     47394 549691 13518
#> - goal6                          1    138086 640383 13900
#> 
#> Step:  AIC=13294
#> goal1 ~ goal2 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + ef_government + 
#>     ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal2                          1       242 502742 13293
#> <none>                                       502500 13294
#> - pf_law                         1       810 503310 13296
#> - ef_regulation                  1      1637 504136 13300
#> - ef_money                       1      3175 505675 13308
#> - pf_movement                    1      3853 506353 13311
#> - MilitaryExpenditurePercentGDP  1      8689 511189 13335
#> - goal5                          1      9527 512026 13339
#> - goal8                          1     10241 512741 13342
#> - ef_trade                       1     14629 517129 13364
#> - goal13                         1     17088 519588 13376
#> - GDPpercapita                   1     18722 521221 13383
#> - internet_usage                 1     19890 522390 13389
#> - goal15                         1     23511 526010 13406
#> - goal10                         1     28622 531121 13430
#> - unemployment.rate              1     28970 531469 13432
#> - goal17                         1     29418 531918 13434
#> - ef_government                  1     43794 546294 13501
#> - pf_religion                    1     47198 549697 13516
#> - goal6                          1    162178 664678 13991
#> 
#> Step:  AIC=13293
#> goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + ef_government + 
#>     ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       502742 13293
#> - pf_law                         1       812 503554 13295
#> - ef_regulation                  1      1757 504499 13300
#> - ef_money                       1      3055 505797 13306
#> - pf_movement                    1      3774 506515 13310
#> - MilitaryExpenditurePercentGDP  1      8580 511322 13333
#> - goal5                          1      9303 512045 13337
#> - goal8                          1     11539 514281 13348
#> - ef_trade                       1     14785 517527 13364
#> - goal13                         1     16857 519598 13374
#> - GDPpercapita                   1     18695 521437 13382
#> - internet_usage                 1     20501 523243 13391
#> - goal15                         1     24250 526992 13409
#> - goal10                         1     28643 531385 13430
#> - goal17                         1     29177 531919 13432
#> - unemployment.rate              1     29481 532223 13434
#> - ef_government                  1     43569 546311 13499
#> - pf_religion                    1     47922 550664 13519
#> - goal6                          1    178536 681278 14051
summary(selmod)
#> 
#> Call:
#> lm(formula = goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_movement + pf_religion + ef_government + 
#>     ef_money + ef_trade + ef_regulation, data = data_question1)
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#> -44.93  -8.71   0.10   9.05  39.96 
#> 
#> Coefficients:
#>                                Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                   -4.25e+01   5.82e+00   -7.29  4.0e-13
#> goal5                         -1.88e-01   2.77e-02   -6.77  1.6e-11
#> goal6                          1.10e+00   3.72e-02   29.68  < 2e-16
#> goal8                          4.30e-01   5.70e-02    7.54  6.3e-14
#> goal10                         1.68e-01   1.41e-02   11.89  < 2e-16
#> goal13                        -2.50e-01   2.74e-02   -9.12  < 2e-16
#> goal15                        -2.70e-01   2.46e-02  -10.94  < 2e-16
#> goal17                         3.72e-01   3.10e-02   12.00  < 2e-16
#> unemployment.rate              8.20e+01   6.80e+00   12.06  < 2e-16
#> GDPpercapita                  -2.99e-04   3.11e-05   -9.60  < 2e-16
#> MilitaryExpenditurePercentGDP  1.87e+00   2.88e-01    6.51  9.3e-11
#> internet_usage                 1.79e+01   1.78e+00   10.06  < 2e-16
#> pf_law                         8.32e-01   4.16e-01    2.00  0.04544
#> pf_movement                    1.38e+00   3.19e-01    4.31  1.7e-05
#> pf_religion                   -4.22e+00   2.74e-01  -15.38  < 2e-16
#> ef_government                  4.62e+00   3.15e-01   14.66  < 2e-16
#> ef_money                      -1.19e+00   3.07e-01   -3.88  0.00011
#> ef_trade                       3.45e+00   4.04e-01    8.54  < 2e-16
#> ef_regulation                 -1.22e+00   4.15e-01   -2.94  0.00327
#>                                  
#> (Intercept)                   ***
#> goal5                         ***
#> goal6                         ***
#> goal8                         ***
#> goal10                        ***
#> goal13                        ***
#> goal15                        ***
#> goal17                        ***
#> unemployment.rate             ***
#> GDPpercapita                  ***
#> MilitaryExpenditurePercentGDP ***
#> internet_usage                ***
#> pf_law                        *  
#> pf_movement                   ***
#> pf_religion                   ***
#> ef_government                 ***
#> ef_money                      ***
#> ef_trade                      ***
#> ef_regulation                 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 14.2 on 2480 degrees of freedom
#> Multiple R-squared:  0.805,  Adjusted R-squared:  0.804 
#> F-statistic:  570 on 18 and 2480 DF,  p-value: <2e-16
vif(selmod) #pf_law -> get rid of it
#>                         goal5                         goal6 
#>                          2.54                          4.04 
#>                         goal8                        goal10 
#>                          3.50                          1.98 
#>                        goal13                        goal15 
#>                          3.60                          1.31 
#>                        goal17             unemployment.rate 
#>                          1.81                          1.98 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.29                          1.40 
#>                internet_usage                        pf_law 
#>                          3.69                          5.24 
#>                   pf_movement                   pf_religion 
#>                          3.29                          2.84 
#>                 ef_government                      ef_money 
#>                          1.65                          2.55 
#>                      ef_trade                 ef_regulation 
#>                          3.72                          2.23
reg_goal1_all_new <- lm(goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                          internet_usage + pf_movement + pf_religion + ef_government + 
                          ef_money + ef_trade + ef_regulation, data = data_question1)
selmod <- step(reg_goal1_all_new, scope=list(lower=nullmod, upper=reg_goal1_all_new), direction="backward") 
#> Start:  AIC=13295
#> goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_movement + pf_religion + ef_government + 
#>     ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       503554 13295
#> - ef_regulation                  1      1333 504887 13300
#> - ef_money                       1      3201 506755 13309
#> - pf_movement                    1      4356 507910 13315
#> - goal5                          1      9049 512603 13338
#> - MilitaryExpenditurePercentGDP  1      9345 512899 13339
#> - goal8                          1     12741 516295 13356
#> - ef_trade                       1     16228 519782 13372
#> - GDPpercapita                   1     17989 521543 13381
#> - goal13                         1     18696 522249 13384
#> - internet_usage                 1     20330 523884 13392
#> - goal15                         1     24992 528545 13414
#> - goal17                         1     28396 531950 13430
#> - goal10                         1     31840 535394 13446
#> - unemployment.rate              1     35281 538835 13462
#> - ef_government                  1     43510 547064 13500
#> - pf_religion                    1     47763 551317 13520
#> - goal6                          1    179848 683402 14056
vif(selmod)
#>                         goal5                         goal6 
#>                          2.53                          4.03 
#>                         goal8                        goal10 
#>                          3.42                          1.90 
#>                        goal13                        goal15 
#>                          3.49                          1.30 
#>                        goal17             unemployment.rate 
#>                          1.78                          1.81 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          3.95                          1.38 
#>                internet_usage                   pf_movement 
#>                          3.69                          3.23 
#>                   pf_religion                 ef_government 
#>                          2.67                          1.54 
#>                      ef_money                      ef_trade 
#>                          2.54                          3.63 
#>                 ef_regulation 
#>                          2.12
reg_goal1_all_new <- lm(goal1 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                          internet_usage + pf_movement + pf_religion + ef_government + 
                          ef_money + ef_trade + ef_regulation, data = data_question1)
#for reg2
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal2_all_new, scope=list(lower=nullmod, upper=reg_goal2_all_new), direction="backward") 
#> Start:  AIC=9578
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - GDPpercapita                   1        14 113264 9577
#> - ef_government                  1        43 113293 9577
#> - pf_religion                    1        54 113304 9578
#> - goal1                          1        64 113314 9578
#> <none>                                       113250 9578
#> - pf_expression                  1        92 113342 9578
#> - MilitaryExpenditurePercentGDP  1       141 113391 9580
#> - goal10                         1       152 113402 9580
#> - pf_law                         1       166 113416 9580
#> - goal15                         1       187 113438 9581
#> - ef_trade                       1       242 113492 9582
#> - unemployment.rate              1       278 113529 9583
#> - goal17                         1       353 113604 9584
#> - internet_usage                 1       396 113646 9585
#> - pf_movement                    1       588 113839 9589
#> - ef_money                       1       645 113895 9591
#> - ef_regulation                  1      1041 114291 9599
#> - pf_identity                    1      1780 115030 9615
#> - goal13                         1      2012 115263 9620
#> - population                     1      2343 115593 9628
#> - goal5                          1      3195 116445 9646
#> - goal8                          1      3590 116841 9654
#> - pf_security                    1      4551 117801 9675
#> - goal6                          1      8584 121834 9759
#> 
#> Step:  AIC=9577
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - ef_government                  1        45 113309 9576
#> - goal1                          1        55 113319 9576
#> - pf_religion                    1        55 113319 9576
#> - pf_expression                  1        86 113350 9577
#> <none>                                       113264 9577
#> - goal10                         1       148 113412 9578
#> - MilitaryExpenditurePercentGDP  1       152 113416 9578
#> - pf_law                         1       153 113417 9578
#> - goal15                         1       196 113460 9579
#> - ef_trade                       1       238 113502 9580
#> - unemployment.rate              1       266 113530 9581
#> - goal17                         1       342 113606 9582
#> - internet_usage                 1       513 113777 9586
#> - pf_movement                    1       602 113866 9588
#> - ef_money                       1       653 113917 9589
#> - ef_regulation                  1      1062 114326 9598
#> - pf_identity                    1      1794 115059 9614
#> - population                     1      2351 115616 9626
#> - goal13                         1      2420 115684 9628
#> - goal5                          1      3182 116446 9644
#> - goal8                          1      3586 116850 9653
#> - pf_security                    1      4537 117801 9673
#> - goal6                          1      8714 121978 9760
#> 
#> Step:  AIC=9576
#> goal2 ~ goal1 + goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal1                          1        32 113341 9574
#> - pf_religion                    1        40 113349 9575
#> - pf_expression                  1        88 113397 9576
#> <none>                                       113309 9576
#> - pf_law                         1       122 113431 9576
#> - goal10                         1       134 113443 9577
#> - MilitaryExpenditurePercentGDP  1       137 113446 9577
#> - goal15                         1       176 113485 9578
#> - ef_trade                       1       226 113535 9579
#> - goal17                         1       306 113615 9580
#> - unemployment.rate              1       323 113632 9581
#> - internet_usage                 1       522 113831 9585
#> - pf_movement                    1       609 113918 9587
#> - ef_money                       1       627 113936 9588
#> - ef_regulation                  1      1213 114522 9600
#> - pf_identity                    1      1836 115145 9614
#> - population                     1      2378 115687 9626
#> - goal13                         1      2379 115688 9626
#> - goal5                          1      3357 116666 9647
#> - goal8                          1      3715 117024 9654
#> - pf_security                    1      4758 118067 9677
#> - goal6                          1      8840 122149 9761
#> 
#> Step:  AIC=9574
#> goal2 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_religion                    1        27 113369 9573
#> - pf_expression                  1        90 113431 9574
#> <none>                                       113341 9574
#> - goal10                         1       114 113456 9575
#> - MilitaryExpenditurePercentGDP  1       122 113464 9575
#> - pf_law                         1       135 113476 9575
#> - goal15                         1       230 113572 9578
#> - ef_trade                       1       272 113614 9578
#> - goal17                         1       282 113623 9579
#> - unemployment.rate              1       393 113734 9581
#> - internet_usage                 1       569 113910 9585
#> - pf_movement                    1       587 113928 9585
#> - ef_money                       1       612 113954 9586
#> - ef_regulation                  1      1205 114547 9599
#> - pf_identity                    1      1818 115159 9612
#> - goal13                         1      2349 115691 9624
#> - population                     1      2366 115707 9624
#> - goal5                          1      3342 116684 9645
#> - goal8                          1      3855 117196 9656
#> - pf_security                    1      4744 118086 9675
#> - goal6                          1     12148 125490 9827
#> 
#> Step:  AIC=9573
#> goal2 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_expression                  1        63 113431 9572
#> <none>                                       113369 9573
#> - pf_law                         1       123 113491 9574
#> - MilitaryExpenditurePercentGDP  1       123 113492 9574
#> - goal10                         1       160 113528 9575
#> - goal15                         1       220 113589 9576
#> - ef_trade                       1       262 113630 9577
#> - goal17                         1       311 113680 9578
#> - unemployment.rate              1       380 113749 9579
#> - pf_movement                    1       565 113934 9583
#> - internet_usage                 1       580 113948 9584
#> - ef_money                       1       606 113975 9584
#> - ef_regulation                  1      1196 114564 9597
#> - pf_identity                    1      1796 115165 9610
#> - goal13                         1      2441 115809 9624
#> - population                     1      2535 115904 9626
#> - goal5                          1      3318 116687 9643
#> - goal8                          1      3873 117242 9655
#> - pf_security                    1      4747 118116 9674
#> - goal6                          1     12123 125492 9825
#> 
#> Step:  AIC=9572
#> goal2 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_identity + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                       113431 9572
#> - MilitaryExpenditurePercentGDP  1        97 113529 9573
#> - goal10                         1       144 113576 9574
#> - pf_law                         1       219 113651 9575
#> - ef_trade                       1       254 113685 9576
#> - goal15                         1       265 113697 9576
#> - goal17                         1       337 113769 9578
#> - unemployment.rate              1       382 113813 9579
#> - ef_money                       1       587 114019 9583
#> - internet_usage                 1       646 114077 9585
#> - pf_movement                    1      1011 114442 9593
#> - ef_regulation                  1      1165 114597 9596
#> - pf_identity                    1      1742 115173 9609
#> - goal13                         1      2460 115892 9624
#> - population                     1      2480 115911 9624
#> - goal5                          1      3441 116873 9645
#> - goal8                          1      3865 117296 9654
#> - pf_security                    1      4827 118258 9675
#> - goal6                          1     12061 125493 9823
vif(selmod) 
#>                         goal5                         goal6 
#>                          2.61                          4.54 
#>                         goal8                        goal10 
#>                          3.68                          2.04 
#>                        goal13                        goal15 
#>                          2.78                          1.29 
#>                        goal17             unemployment.rate 
#>                          1.81                          1.87 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.38                          3.36 
#>                        pf_law                   pf_security 
#>                          4.33                          1.95 
#>                   pf_movement                   pf_identity 
#>                          2.49                          2.39 
#>                      ef_money                      ef_trade 
#>                          2.56                          3.73 
#>                 ef_regulation                    population 
#>                          2.16                          1.21
reg_goal2_all_new <- lm(goal2 ~ goal5 + goal6 + goal8 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + MilitaryExpenditurePercentGDP + internet_usage + 
                          pf_law + pf_security + pf_movement + pf_identity + ef_money + 
                          ef_trade + ef_regulation + population, data = data_question1)
#reg5
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal5_all_new, scope=list(lower=nullmod, upper=reg_goal5_all_new), direction="backward") 
#> Start:  AIC=11387
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - population                     1         1 233512 11385
#> - goal8                          1       158 233669 11386
#> <none>                                       233511 11387
#> - goal15                         1       351 233862 11389
#> - ef_trade                       1       360 233871 11389
#> - pf_expression                  1       380 233891 11389
#> - ef_money                       1       574 234085 11391
#> - pf_movement                    1       802 234313 11393
#> - pf_religion                    1      1039 234550 11396
#> - GDPpercapita                   1      1599 235110 11402
#> - pf_law                         1      2021 235532 11406
#> - goal6                          1      3636 237147 11423
#> - goal1                          1      4297 237808 11430
#> - unemployment.rate              1      4400 237911 11431
#> - ef_government                  1      4625 238135 11434
#> - pf_security                    1      5314 238825 11441
#> - goal13                         1      6035 239546 11449
#> - ef_regulation                  1      6210 239721 11450
#> - MilitaryExpenditurePercentGDP  1      6386 239897 11452
#> - goal2                          1      6587 240098 11454
#> - goal10                         1      8914 242425 11478
#> - internet_usage                 1     10693 244204 11497
#> - goal17                         1     12620 246131 11516
#> - pf_identity                    1     13120 246631 11521
#> 
#> Step:  AIC=11385
#> goal5 ~ goal1 + goal2 + goal6 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal8                          1       157 233669 11384
#> <none>                                       233512 11385
#> - ef_trade                       1       360 233872 11387
#> - goal15                         1       367 233879 11387
#> - pf_expression                  1       422 233934 11387
#> - ef_money                       1       573 234085 11389
#> - pf_movement                    1       806 234318 11391
#> - pf_religion                    1      1166 234678 11395
#> - GDPpercapita                   1      1601 235113 11400
#> - pf_law                         1      2027 235539 11404
#> - goal6                          1      3686 237198 11422
#> - goal1                          1      4297 237808 11428
#> - unemployment.rate              1      4421 237932 11430
#> - ef_government                  1      4625 238137 11432
#> - pf_security                    1      5315 238827 11439
#> - goal13                         1      6042 239554 11447
#> - ef_regulation                  1      6212 239724 11448
#> - MilitaryExpenditurePercentGDP  1      6428 239940 11451
#> - goal2                          1      6703 240215 11454
#> - goal10                         1      9138 242650 11479
#> - internet_usage                 1     10697 244209 11495
#> - goal17                         1     13096 246608 11519
#> - pf_identity                    1     13706 247218 11525
#> 
#> Step:  AIC=11384
#> goal5 ~ goal1 + goal2 + goal6 + goal10 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       233669 11384
#> - pf_expression                  1       327 233996 11386
#> - ef_trade                       1       329 233998 11386
#> - goal15                         1       386 234055 11387
#> - ef_money                       1       566 234235 11389
#> - pf_movement                    1       775 234444 11391
#> - pf_religion                    1      1146 234815 11395
#> - GDPpercapita                   1      1615 235284 11400
#> - pf_law                         1      2100 235769 11405
#> - goal6                          1      3843 237512 11423
#> - goal1                          1      4155 237824 11427
#> - ef_government                  1      4878 238547 11434
#> - pf_security                    1      5191 238860 11437
#> - goal13                         1      6162 239831 11448
#> - ef_regulation                  1      6174 239843 11448
#> - MilitaryExpenditurePercentGDP  1      6403 240072 11450
#> - unemployment.rate              1      7009 240678 11456
#> - goal2                          1      7424 241093 11461
#> - goal10                         1      9124 242793 11478
#> - internet_usage                 1     11180 244849 11499
#> - goal17                         1     13019 246688 11518
#> - pf_identity                    1     14556 248225 11534
vif(selmod) #goal6
#>                         goal1                         goal2 
#>                          4.96                          1.95 
#>                         goal6                        goal10 
#>                          6.26                          2.19 
#>                        goal13                        goal15 
#>                          3.69                          1.41 
#>                        goal17             unemployment.rate 
#>                          1.85                          1.57 
#>                  GDPpercapita MilitaryExpenditurePercentGDP 
#>                          4.48                          1.44 
#>                internet_usage                        pf_law 
#>                          3.78                          5.69 
#>                   pf_security                   pf_movement 
#>                          2.07                          3.68 
#>                   pf_religion                 pf_expression 
#>                          3.87                          4.62 
#>                   pf_identity                 ef_government 
#>                          2.25                          1.78 
#>                      ef_money                      ef_trade 
#>                          2.60                          3.86 
#>                 ef_regulation 
#>                          2.25
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + 
                          unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                          internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
                          pf_expression + pf_identity + ef_government + ef_money + 
                          ef_trade + ef_regulation, data = data_question1)
selmod <- step(reg_goal5_all_new, scope=list(lower=nullmod, upper=reg_goal5_all_new), direction="backward") 
#> Start:  AIC=11423
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_expression                  1       183 237695 11423
#> <none>                                       237512 11423
#> - ef_trade                       1       259 237771 11424
#> - goal15                         1       436 237948 11426
#> - pf_movement                    1       673 238186 11428
#> - ef_money                       1       770 238283 11429
#> - pf_religion                    1       808 238321 11430
#> - GDPpercapita                   1      1287 238799 11435
#> - goal1                          1      1598 239110 11438
#> - pf_law                         1      2320 239833 11446
#> - ef_government                  1      5652 243164 11480
#> - ef_regulation                  1      6020 243532 11484
#> - pf_security                    1      6130 243642 11485
#> - goal13                         1      6404 243916 11488
#> - unemployment.rate              1      7072 244585 11495
#> - MilitaryExpenditurePercentGDP  1      7529 245041 11499
#> - goal10                         1      8896 246408 11513
#> - goal2                          1     12200 249713 11546
#> - goal17                         1     13484 250996 11559
#> - internet_usage                 1     13578 251090 11560
#> - pf_identity                    1     23644 261157 11658
#> 
#> Step:  AIC=11423
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       237695 11423
#> - ef_trade                       1       287 237983 11424
#> - goal15                         1       402 238097 11425
#> - pf_movement                    1       549 238244 11427
#> - ef_money                       1       722 238417 11429
#> - GDPpercapita                   1      1412 239108 11436
#> - pf_religion                    1      1526 239221 11437
#> - goal1                          1      1690 239385 11439
#> - pf_law                         1      2147 239842 11444
#> - ef_government                  1      5617 243313 11480
#> - ef_regulation                  1      6218 243913 11486
#> - pf_security                    1      6272 243968 11486
#> - goal13                         1      6352 244047 11487
#> - unemployment.rate              1      6977 244673 11493
#> - MilitaryExpenditurePercentGDP  1      7355 245050 11497
#> - goal10                         1      9011 246706 11514
#> - goal2                          1     12038 249734 11545
#> - goal17                         1     13329 251025 11558
#> - internet_usage                 1     14252 251947 11567
#> - pf_identity                    1     24316 262011 11665
vif(selmod) #pf_law
#>                         goal1                         goal2 
#>                          3.84                          1.75 
#>                        goal10                        goal13 
#>                          2.19                          3.69 
#>                        goal15                        goal17 
#>                          1.40                          1.84 
#>             unemployment.rate                  GDPpercapita 
#>                          1.56                          4.41 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.40                          3.62 
#>                        pf_law                   pf_security 
#>                          5.41                          2.04 
#>                   pf_movement                   pf_religion 
#>                          3.47                          3.02 
#>                   pf_identity                 ef_government 
#>                          1.93                          1.77 
#>                      ef_money                      ef_trade 
#>                          2.58                          3.84 
#>                 ef_regulation 
#>                          2.23
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
                          GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
                          pf_security + pf_movement + pf_religion + pf_identity + 
                          ef_government + ef_money + ef_trade + ef_regulation, data = data_question1)
selmod <- step(reg_goal5_all_new, scope=list(lower=nullmod, upper=reg_goal5_all_new), direction="backward") 
#> Start:  AIC=11444
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
#>     ef_money + ef_trade + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_trade                       1       117 239959 11443
#> <none>                                       239842 11444
#> - goal15                         1       356 240198 11445
#> - GDPpercapita                   1       637 240479 11448
#> - ef_money                       1       675 240517 11449
#> - pf_religion                    1       722 240564 11449
#> - pf_movement                    1       760 240602 11450
#> - goal1                          1      1418 241260 11456
#> - pf_security                    1      4981 244823 11493
#> - unemployment.rate              1      5645 245487 11500
#> - MilitaryExpenditurePercentGDP  1      6534 246376 11509
#> - goal13                         1      8089 247931 11525
#> - ef_government                  1      8113 247955 11525
#> - goal10                         1      8258 248100 11526
#> - ef_regulation                  1      8495 248337 11529
#> - goal2                          1     12158 252000 11565
#> - goal17                         1     12338 252180 11567
#> - internet_usage                 1     14093 253935 11584
#> - pf_identity                    1     25471 265313 11694
#> 
#> Step:  AIC=11443
#> goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
#>     ef_money + ef_regulation
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       239959 11443
#> - goal15                         1       343 240302 11444
#> - ef_money                       1       563 240522 11447
#> - GDPpercapita                   1       644 240603 11448
#> - pf_movement                    1       683 240642 11448
#> - pf_religion                    1       782 240741 11449
#> - goal1                          1      1728 241687 11459
#> - pf_security                    1      5209 245168 11495
#> - unemployment.rate              1      5676 245635 11499
#> - MilitaryExpenditurePercentGDP  1      6704 246663 11510
#> - goal13                         1      7999 247959 11523
#> - goal10                         1      8154 248113 11524
#> - ef_government                  1      8177 248136 11525
#> - ef_regulation                  1      8514 248473 11528
#> - goal2                          1     12060 252019 11563
#> - goal17                         1     12783 252742 11571
#> - internet_usage                 1     14537 254496 11588
#> - pf_identity                    1     25498 265457 11693
vif(selmod) 
#>                         goal1                         goal2 
#>                          3.59                          1.75 
#>                        goal10                        goal13 
#>                          2.15                          3.55 
#>                        goal15                        goal17 
#>                          1.40                          1.79 
#>             unemployment.rate                  GDPpercapita 
#>                          1.49                          4.04 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.37                          3.58 
#>                   pf_security                   pf_movement 
#>                          1.92                          3.35 
#>                   pf_religion                   pf_identity 
#>                          2.75                          1.88 
#>                 ef_government                      ef_money 
#>                          1.65                          1.96 
#>                 ef_regulation 
#>                          1.99
reg_goal5_all_new <- lm(goal5 ~ goal1 + goal2 + goal10 + goal13 + goal15 + goal17 + unemployment.rate + 
                          GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
                          pf_security + pf_movement + pf_religion + pf_identity + ef_government + 
                          ef_money + ef_regulation, data = data_question1)
#reg6
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal6_all_new, scope=list(lower=nullmod, upper=reg_goal6_all_new), direction="backward") 
#> Start:  AIC=9038
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal13                         1         1  91239 9036
#> - goal15                         1         3  91241 9036
#> - goal17                         1        17  91256 9037
#> - pf_law                         1        25  91264 9037
#> - ef_trade                       1        34  91272 9037
#> - goal10                         1        59  91297 9038
#> <none>                                        91238 9038
#> - ef_regulation                  1        84  91322 9039
#> - pf_movement                    1       132  91370 9040
#> - pf_religion                    1       247  91486 9043
#> - ef_government                  1       288  91527 9044
#> - unemployment.rate              1       309  91548 9045
#> - ef_money                       1       343  91582 9046
#> - MilitaryExpenditurePercentGDP  1       478  91717 9049
#> - pf_expression                  1       610  91848 9053
#> - GDPpercapita                   1       632  91870 9054
#> - pf_security                    1       653  91891 9054
#> - goal8                          1       758  91997 9057
#> - population                     1       946  92184 9062
#> - internet_usage                 1      1164  92403 9068
#> - goal5                          1      1421  92659 9075
#> - goal2                          1      6915  98154 9219
#> - pf_identity                    1     10106 101345 9299
#> - goal1                          1     24416 115655 9629
#> 
#> Step:  AIC=9036
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal15                         1         3  91242 9034
#> - goal17                         1        17  91256 9035
#> - pf_law                         1        24  91264 9035
#> - ef_trade                       1        34  91273 9035
#> - goal10                         1        58  91297 9036
#> <none>                                        91239 9036
#> - ef_regulation                  1        85  91325 9037
#> - pf_movement                    1       132  91371 9038
#> - pf_religion                    1       250  91490 9041
#> - ef_government                  1       288  91527 9042
#> - unemployment.rate              1       312  91551 9043
#> - ef_money                       1       345  91584 9044
#> - MilitaryExpenditurePercentGDP  1       497  91737 9048
#> - pf_expression                  1       609  91848 9051
#> - pf_security                    1       652  91891 9052
#> - goal8                          1       758  91997 9055
#> - GDPpercapita                   1       796  92035 9056
#> - population                     1       945  92184 9060
#> - internet_usage                 1      1175  92414 9066
#> - goal5                          1      1447  92686 9074
#> - goal2                          1      7069  98308 9221
#> - pf_identity                    1     10106 101345 9297
#> - goal1                          1     25405 116644 9648
#> 
#> Step:  AIC=9034
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal17 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal17                         1        16  91258 9033
#> - pf_law                         1        25  91267 9033
#> - ef_trade                       1        33  91276 9033
#> - goal10                         1        55  91298 9034
#> <none>                                        91242 9034
#> - ef_regulation                  1        85  91327 9035
#> - pf_movement                    1       132  91374 9036
#> - pf_religion                    1       249  91491 9039
#> - ef_government                  1       285  91527 9040
#> - unemployment.rate              1       314  91556 9041
#> - ef_money                       1       345  91588 9042
#> - MilitaryExpenditurePercentGDP  1       496  91738 9046
#> - pf_expression                  1       607  91849 9049
#> - pf_security                    1       650  91892 9050
#> - goal8                          1       755  91997 9053
#> - GDPpercapita                   1       810  92052 9055
#> - population                     1       954  92196 9058
#> - internet_usage                 1      1196  92438 9065
#> - goal5                          1      1444  92687 9072
#> - goal2                          1      7096  98338 9220
#> - pf_identity                    1     10183 101426 9297
#> - goal1                          1     27142 118385 9683
#> 
#> Step:  AIC=9033
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - pf_law                         1        33  91291 9032
#> - ef_trade                       1        39  91297 9032
#> - goal10                         1        61  91319 9033
#> <none>                                        91258 9033
#> - ef_regulation                  1        87  91345 9033
#> - pf_movement                    1       146  91404 9035
#> - pf_religion                    1       256  91513 9038
#> - ef_government                  1       270  91528 9038
#> - unemployment.rate              1       298  91556 9039
#> - ef_money                       1       336  91594 9040
#> - MilitaryExpenditurePercentGDP  1       550  91808 9046
#> - pf_expression                  1       592  91850 9047
#> - pf_security                    1       642  91900 9048
#> - goal8                          1       762  92020 9052
#> - GDPpercapita                   1       798  92055 9053
#> - population                     1       940  92198 9057
#> - internet_usage                 1      1185  92443 9063
#> - goal5                          1      1445  92703 9070
#> - goal2                          1      7136  98394 9219
#> - pf_identity                    1     10179 101437 9295
#> - goal1                          1     28700 119958 9714
#> 
#> Step:  AIC=9032
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - ef_trade                       1        51  91342 9031
#> - ef_regulation                  1        67  91358 9032
#> - goal10                         1        73  91364 9032
#> <none>                                        91291 9032
#> - pf_movement                    1       146  91437 9034
#> - pf_religion                    1       284  91575 9038
#> - ef_government                  1       325  91616 9039
#> - ef_money                       1       326  91617 9039
#> - unemployment.rate              1       353  91644 9039
#> - MilitaryExpenditurePercentGDP  1       521  91812 9044
#> - pf_security                    1       609  91900 9046
#> - pf_expression                  1       670  91961 9048
#> - goal8                          1       789  92080 9051
#> - population                     1       942  92233 9055
#> - GDPpercapita                   1      1094  92385 9060
#> - internet_usage                 1      1169  92460 9062
#> - goal5                          1      1497  92788 9070
#> - goal2                          1      7106  98397 9217
#> - pf_identity                    1     10234 101525 9295
#> - goal1                          1     28930 120221 9718
#> 
#> Step:  AIC=9031
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - ef_regulation                  1        43  91385 9030
#> - goal10                         1        62  91404 9031
#> <none>                                        91342 9031
#> - pf_movement                    1       128  91469 9033
#> - pf_religion                    1       280  91622 9037
#> - ef_government                  1       312  91654 9038
#> - unemployment.rate              1       363  91705 9039
#> - MilitaryExpenditurePercentGDP  1       499  91841 9043
#> - pf_security                    1       582  91924 9045
#> - ef_money                       1       595  91936 9045
#> - pf_expression                  1       709  92051 9049
#> - goal8                          1       840  92182 9052
#> - population                     1      1003  92345 9056
#> - GDPpercapita                   1      1106  92447 9059
#> - internet_usage                 1      1131  92473 9060
#> - goal5                          1      1475  92817 9069
#> - goal2                          1      7180  98522 9218
#> - pf_identity                    1     10668 102010 9305
#> - goal1                          1     30574 121916 9751
#> 
#> Step:  AIC=9030
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + goal10 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - goal10                         1        54  91439 9030
#> <none>                                        91385 9030
#> - pf_movement                    1       155  91540 9033
#> - pf_religion                    1       277  91662 9036
#> - unemployment.rate              1       337  91722 9038
#> - ef_government                  1       370  91755 9038
#> - MilitaryExpenditurePercentGDP  1       511  91896 9042
#> - ef_money                       1       553  91939 9043
#> - pf_security                    1       608  91993 9045
#> - pf_expression                  1       709  92095 9048
#> - goal8                          1       835  92220 9051
#> - population                     1      1007  92392 9056
#> - GDPpercapita                   1      1078  92463 9058
#> - internet_usage                 1      1089  92474 9058
#> - goal5                          1      1434  92819 9067
#> - goal2                          1      7448  98833 9224
#> - pf_identity                    1     11044 102429 9313
#> - goal1                          1     30548 121933 9749
#> 
#> Step:  AIC=9030
#> goal6 ~ goal1 + goal2 + goal5 + goal8 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                        91439 9030
#> - pf_movement                    1       169  91608 9032
#> - pf_religion                    1       241  91680 9034
#> - unemployment.rate              1       304  91743 9036
#> - ef_government                  1       422  91861 9039
#> - MilitaryExpenditurePercentGDP  1       519  91958 9042
#> - pf_security                    1       555  91993 9043
#> - ef_money                       1       557  91996 9043
#> - pf_expression                  1       763  92202 9049
#> - goal8                          1       861  92300 9051
#> - GDPpercapita                   1      1153  92592 9059
#> - population                     1      1158  92597 9059
#> - internet_usage                 1      1166  92605 9060
#> - goal5                          1      1381  92820 9065
#> - goal2                          1      7395  98834 9222
#> - pf_identity                    1     11084 102522 9314
#> - goal1                          1     32483 123921 9787
vif(selmod) 
#>                         goal1                         goal2 
#>                          3.13                          1.88 
#>                         goal5                         goal8 
#>                          2.42                          4.00 
#>             unemployment.rate                  GDPpercapita 
#>                          1.77                          2.83 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.34                          3.58 
#>                   pf_security                   pf_movement 
#>                          1.79                          3.49 
#>                   pf_religion                 pf_expression 
#>                          4.18                          4.85 
#>                   pf_identity                 ef_government 
#>                          2.17                          1.56 
#>                      ef_money                    population 
#>                          1.91                          1.29
reg_goal6_all_new <- lm(goal6 ~ goal1 + goal2 + goal5 + goal8 + unemployment.rate + GDPpercapita + 
                          MilitaryExpenditurePercentGDP + internet_usage + pf_security + 
                          pf_movement + pf_religion + pf_expression + pf_identity + 
                          ef_government + ef_money + population, data = data_question1)

#reg8
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal8_all_new, scope=list(lower=nullmod, upper=reg_goal8_all_new), direction="backward") 
#> Start:  AIC=7660
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - MilitaryExpenditurePercentGDP  1         6 52513 7658
#> - goal17                         1         9 52516 7658
#> - GDPpercapita                   1        13 52520 7658
#> - ef_money                       1        24 52531 7659
#> <none>                                       52507 7660
#> - goal10                         1        46 52554 7660
#> - goal5                          1        71 52579 7661
#> - ef_regulation                  1        86 52593 7662
#> - pf_movement                    1        86 52594 7662
#> - goal13                         1        90 52597 7662
#> - goal15                         1       114 52621 7663
#> - goal7                          1       185 52692 7666
#> - pf_religion                    1       193 52700 7667
#> - pf_law                         1       227 52734 7668
#> - ef_government                  1       450 52957 7679
#> - ef_trade                       1       456 52964 7679
#> - pf_security                    1       521 53028 7682
#> - goal6                          1       558 53065 7684
#> - population                     1       661 53169 7689
#> - internet_usage                 1       745 53253 7693
#> - pf_identity                    1       840 53348 7697
#> - goal1                          1      1149 53657 7712
#> - goal2                          1      1735 54243 7739
#> - pf_expression                  1      2086 54594 7755
#> - unemployment.rate              1     15887 68395 8318
#> 
#> Step:  AIC=7658
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                     Df Sum of Sq   RSS  AIC
#> - goal17             1         6 52519 7656
#> - GDPpercapita       1        15 52528 7657
#> - ef_money           1        23 52536 7657
#> <none>                           52513 7658
#> - goal10             1        46 52559 7658
#> - goal5              1        67 52579 7659
#> - ef_regulation      1        88 52601 7660
#> - pf_movement        1        96 52609 7660
#> - goal13             1       100 52613 7661
#> - goal15             1       113 52626 7661
#> - goal7              1       184 52696 7665
#> - pf_religion        1       196 52709 7665
#> - pf_law             1       244 52757 7667
#> - ef_trade           1       465 52978 7678
#> - ef_government      1       467 52979 7678
#> - pf_security        1       515 53028 7680
#> - goal6              1       553 53066 7682
#> - population         1       673 53186 7688
#> - internet_usage     1       746 53259 7691
#> - pf_identity        1       835 53348 7695
#> - goal1              1      1180 53693 7711
#> - goal2              1      1731 54243 7737
#> - pf_expression      1      2094 54607 7754
#> - unemployment.rate  1     15885 68398 8316
#> 
#> Step:  AIC=7656
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + unemployment.rate + GDPpercapita + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                     Df Sum of Sq   RSS  AIC
#> - GDPpercapita       1        17 52536 7655
#> - ef_money           1        25 52545 7655
#> <none>                           52519 7656
#> - goal10             1        48 52567 7656
#> - goal5              1        61 52581 7657
#> - ef_regulation      1        90 52609 7658
#> - pf_movement        1       102 52621 7659
#> - goal13             1       105 52624 7659
#> - goal15             1       117 52637 7660
#> - goal7              1       189 52708 7663
#> - pf_religion        1       199 52719 7664
#> - pf_law             1       258 52777 7666
#> - ef_government      1       467 52986 7676
#> - ef_trade           1       482 53001 7677
#> - pf_security        1       529 53048 7679
#> - goal6              1       559 53078 7681
#> - population         1       716 53235 7688
#> - internet_usage     1       741 53260 7689
#> - pf_identity        1       830 53349 7693
#> - goal1              1      1194 53713 7710
#> - goal2              1      1754 54274 7736
#> - pf_expression      1      2093 54612 7752
#> - unemployment.rate  1     17150 69669 8360
#> 
#> Step:  AIC=7655
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + unemployment.rate + internet_usage + pf_law + pf_security + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                     Df Sum of Sq   RSS  AIC
#> - ef_money           1        27 52564 7654
#> <none>                           52536 7655
#> - goal10             1        46 52582 7655
#> - goal5              1        64 52600 7656
#> - ef_regulation      1        86 52622 7657
#> - goal13             1        89 52625 7657
#> - pf_movement        1        99 52635 7658
#> - goal15             1       126 52662 7659
#> - goal7              1       185 52721 7662
#> - pf_religion        1       199 52735 7662
#> - pf_law             1       241 52777 7664
#> - ef_government      1       459 52995 7675
#> - ef_trade           1       492 53028 7676
#> - pf_security        1       540 53076 7679
#> - goal6              1       546 53082 7679
#> - population         1       718 53254 7687
#> - internet_usage     1       762 53298 7689
#> - pf_identity        1       835 53371 7692
#> - goal1              1      1262 53799 7712
#> - goal2              1      1752 54288 7735
#> - pf_expression      1      2078 54614 7750
#> - unemployment.rate  1     17439 69975 8369
#> 
#> Step:  AIC=7654
#> goal8 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal13 + 
#>     goal15 + unemployment.rate + internet_usage + pf_law + pf_security + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_trade + ef_regulation + population
#> 
#>                     Df Sum of Sq   RSS  AIC
#> <none>                           52564 7654
#> - goal10             1        43 52606 7654
#> - goal5              1        59 52622 7655
#> - goal13             1        91 52655 7657
#> - ef_regulation      1        92 52655 7657
#> - pf_movement        1       101 52664 7657
#> - goal15             1       130 52693 7658
#> - goal7              1       180 52743 7661
#> - pf_religion        1       209 52773 7662
#> - pf_law             1       246 52810 7664
#> - ef_government      1       483 53046 7675
#> - ef_trade           1       502 53066 7676
#> - goal6              1       532 53095 7677
#> - pf_security        1       559 53123 7679
#> - population         1       710 53273 7686
#> - internet_usage     1       735 53298 7687
#> - pf_identity        1       856 53419 7693
#> - goal1              1      1279 53843 7712
#> - goal2              1      1730 54293 7733
#> - pf_expression      1      2056 54620 7748
#> - unemployment.rate  1     17446 70009 8369
vif(selmod) #goal6
#>             goal1             goal2             goal5 
#>              6.03              2.05              2.78 
#>             goal6             goal7            goal10 
#>              6.68              6.40              2.32 
#>            goal13            goal15 unemployment.rate 
#>              2.95              1.45              1.49 
#>    internet_usage            pf_law       pf_security 
#>              3.42              5.28              2.05 
#>       pf_movement       pf_religion     pf_expression 
#>              3.57              4.42              4.87 
#>       pf_identity     ef_government          ef_trade 
#>              2.48              1.73              3.03 
#>     ef_regulation        population 
#>              2.39              1.39
reg_goal8_all_new <- lm(goal8 ~ goal1 + goal2 + goal13 + goal15 + unemployment.rate + 
                          internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
                          pf_expression + pf_identity + ef_government + ef_trade + 
                          population, data = data_question1)

#reg10
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal10_all_new, scope=list(lower=nullmod, upper=reg_goal10_all_new), direction="backward") 
#> Start:  AIC=14647
#> goal10 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_identity                    1        50 860055 14645
#> - goal7                          1       242 860248 14645
#> - goal6                          1       358 860363 14646
#> - MilitaryExpenditurePercentGDP  1       528 860534 14646
#> <none>                                       860006 14647
#> - goal8                          1       760 860765 14647
#> - goal2                          1      1224 861230 14648
#> - GDPpercapita                   1      1570 861575 14649
#> - internet_usage                 1      1588 861593 14649
#> - ef_money                       1      2335 862341 14652
#> - pf_movement                    1      2657 862662 14652
#> - goal17                         1      2737 862743 14653
#> - goal13                         1      4093 864099 14657
#> - pf_expression                  1      4275 864280 14657
#> - ef_regulation                  1      5777 865783 14662
#> - pf_law                         1      7044 867050 14665
#> - ef_government                  1      7666 867672 14667
#> - ef_trade                       1     12419 872425 14681
#> - population                     1     23975 883980 14713
#> - goal1                          1     29521 889527 14729
#> - goal5                          1     32700 892706 14738
#> - unemployment.rate              1     35435 895441 14746
#> - pf_religion                    1     36114 896120 14748
#> - goal15                         1     68563 928569 14836
#> - pf_security                    1     68653 928659 14837
#> 
#> Step:  AIC=14645
#> goal10 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal13 + 
#>     goal15 + goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal7                          1       256 860311 14644
#> - goal6                          1       480 860535 14644
#> - MilitaryExpenditurePercentGDP  1       571 860626 14645
#> <none>                                       860055 14645
#> - goal8                          1       822 860878 14645
#> - goal2                          1      1310 861365 14647
#> - internet_usage                 1      1551 861607 14647
#> - GDPpercapita                   1      1553 861608 14647
#> - ef_money                       1      2303 862359 14650
#> - pf_movement                    1      2610 862665 14650
#> - goal17                         1      2697 862753 14651
#> - goal13                         1      4107 864162 14655
#> - pf_expression                  1      4252 864307 14655
#> - ef_regulation                  1      5736 865792 14660
#> - pf_law                         1      7096 867151 14663
#> - ef_government                  1      7626 867681 14665
#> - ef_trade                       1     12409 872464 14679
#> - population                     1     24602 884658 14713
#> - goal1                          1     29504 889559 14727
#> - goal5                          1     33741 893797 14739
#> - unemployment.rate              1     35385 895441 14744
#> - pf_religion                    1     36770 896825 14748
#> - pf_security                    1     68619 928675 14835
#> - goal15                         1     69906 929962 14838
#> 
#> Step:  AIC=14644
#> goal10 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - MilitaryExpenditurePercentGDP  1       561 860873 14643
#> <none>                                       860311 14644
#> - goal6                          1       756 861067 14644
#> - goal8                          1       778 861089 14644
#> - goal2                          1      1240 861552 14645
#> - GDPpercapita                   1      1506 861818 14646
#> - internet_usage                 1      1842 862153 14647
#> - ef_money                       1      2232 862543 14648
#> - pf_movement                    1      2553 862864 14649
#> - goal17                         1      2604 862916 14649
#> - goal13                         1      3907 864218 14653
#> - pf_expression                  1      4154 864466 14654
#> - ef_regulation                  1      5481 865793 14658
#> - ef_government                  1      7376 867688 14663
#> - pf_law                         1      7590 867901 14664
#> - ef_trade                       1     12156 872468 14677
#> - population                     1     24477 884789 14712
#> - goal5                          1     34018 894329 14739
#> - unemployment.rate              1     35130 895441 14742
#> - pf_religion                    1     37090 897402 14747
#> - goal1                          1     42283 902594 14762
#> - pf_security                    1     68765 929076 14834
#> - goal15                         1     69860 930172 14837
#> 
#> Step:  AIC=14643
#> goal10 ~ goal1 + goal2 + goal5 + goal6 + goal8 + goal13 + goal15 + 
#>     goal17 + unemployment.rate + GDPpercapita + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                     Df Sum of Sq    RSS   AIC
#> <none>                           860873 14643
#> - goal8              1       784 861657 14644
#> - goal6              1       923 861795 14644
#> - goal2              1      1206 862078 14645
#> - GDPpercapita       1      1728 862600 14646
#> - internet_usage     1      1810 862682 14647
#> - ef_money           1      2203 863076 14648
#> - pf_movement        1      2218 863091 14648
#> - goal17             1      3229 864101 14651
#> - goal13             1      3539 864411 14652
#> - pf_expression      1      4580 865452 14655
#> - ef_regulation      1      5593 866466 14657
#> - ef_government      1      7013 867886 14662
#> - pf_law             1      7139 868011 14662
#> - ef_trade           1     12498 873371 14677
#> - population         1     24932 885805 14713
#> - goal5              1     33606 894478 14737
#> - unemployment.rate  1     35509 896381 14742
#> - pf_religion        1     37388 898260 14748
#> - goal1              1     41727 902599 14760
#> - goal15             1     70361 931234 14838
#> - pf_security        1     71594 932467 14841
vif(selmod) #goal6
#>             goal1             goal2             goal5 
#>              4.82              2.09              2.52 
#>             goal6             goal8            goal13 
#>              5.74              4.07              3.69 
#>            goal15            goal17 unemployment.rate 
#>              1.33              1.92              2.03 
#>      GDPpercapita    internet_usage            pf_law 
#>              4.45              3.98              5.57 
#>       pf_security       pf_movement       pf_religion 
#>              1.94              3.54              4.15 
#>     pf_expression     ef_government          ef_money 
#>              5.02              1.80              2.60 
#>          ef_trade     ef_regulation        population 
#>              3.81              2.24              1.35
reg_goal10_all_new <- lm(goal10 ~ goal1 + goal2 + goal5 + goal8 + goal13 + goal15 + 
                           goal17 + unemployment.rate + GDPpercapita + internet_usage + 
                           pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
                           ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)
selmod <- step(reg_goal10_all_new, scope=list(lower=nullmod, upper=reg_goal10_all_new), direction="backward") 
#> Start:  AIC=14644
#> goal10 ~ goal1 + goal2 + goal5 + goal8 + goal13 + goal15 + goal17 + 
#>     unemployment.rate + GDPpercapita + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                     Df Sum of Sq    RSS   AIC
#> <none>                           861795 14644
#> - goal2              1       794 862589 14644
#> - goal8              1      1061 862856 14645
#> - GDPpercapita       1      1978 863774 14648
#> - internet_usage     1      2077 863872 14648
#> - pf_movement        1      2179 863974 14648
#> - ef_money           1      2337 864132 14649
#> - goal17             1      3272 865067 14651
#> - goal13             1      3476 865271 14652
#> - pf_expression      1      4751 866546 14656
#> - ef_regulation      1      5280 867076 14657
#> - ef_government      1      7178 868973 14663
#> - pf_law             1      7232 869028 14663
#> - ef_trade           1     12165 873960 14677
#> - population         1     25473 887269 14715
#> - goal5              1     32877 894672 14736
#> - unemployment.rate  1     34902 896697 14741
#> - pf_religion        1     36561 898356 14746
#> - goal1              1     64543 926339 14822
#> - pf_security        1     70730 932526 14839
#> - goal15             1     71162 932957 14840
vif(selmod) #pf_law
#>             goal1             goal2             goal5 
#>              3.60              1.97              2.38 
#>             goal8            goal13            goal15 
#>              3.99              3.69              1.32 
#>            goal17 unemployment.rate      GDPpercapita 
#>              1.92              2.03              4.42 
#>    internet_usage            pf_law       pf_security 
#>              3.94              5.57              1.93 
#>       pf_movement       pf_religion     pf_expression 
#>              3.54              4.10              5.01 
#>     ef_government          ef_money          ef_trade 
#>              1.80              2.59              3.80 
#>     ef_regulation        population 
#>              2.23              1.35
reg_goal10_all_new <- lm(goal10 ~ goal1 + goal2 + goal5 + goal8 + goal13 + goal15 + goal17 + 
                           unemployment.rate + GDPpercapita + internet_usage + 
                           pf_security + pf_movement + pf_religion + pf_expression + 
                           ef_government + ef_money + ef_trade + ef_regulation + population, data = data_question1)

#reg13
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal13_all_new, scope=list(lower=nullmod, upper=reg_goal13_all_new), direction="backward") 
#> Start:  AIC=10432
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - unemployment.rate              1         8 158983 10430
#> - goal11                         1        15 158991 10430
#> - goal2                          1        31 159006 10430
#> - internet_usage                 1        54 159029 10431
#> - goal1                          1       106 159081 10432
#> <none>                                       158975 10432
#> - pf_identity                    1       130 159105 10432
#> - pf_security                    1       152 159127 10432
#> - goal10                         1       156 159132 10432
#> - goal6                          1       189 159164 10433
#> - goal17                         1       216 159191 10433
#> - pf_law                         1       240 159216 10434
#> - ef_government                  1       311 159286 10435
#> - ef_money                       1       317 159293 10435
#> - population                     1       390 159365 10436
#> - pf_movement                    1       404 159379 10436
#> - goal5                          1       576 159551 10439
#> - pf_religion                    1       591 159567 10439
#> - ef_trade                       1       732 159707 10441
#> - goal8                          1      1191 160166 10449
#> - goal15                         1      1519 160495 10454
#> - goal7                          1      1595 160570 10455
#> - GDPpercapita                   1      1686 160661 10456
#> - pf_expression                  1      1807 160782 10458
#> - ef_regulation                  1      1905 160880 10460
#> - MilitaryExpenditurePercentGDP  1      6340 165315 10528
#> - goal12                         1     91640 250615 11567
#> 
#> Step:  AIC=10430
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal15 + goal17 + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal11                         1        15 158998 10428
#> - goal2                          1        30 159013 10429
#> - internet_usage                 1        54 159037 10429
#> - goal1                          1       100 159083 10430
#> <none>                                       158983 10430
#> - pf_identity                    1       127 159111 10430
#> - goal10                         1       149 159132 10430
#> - pf_security                    1       150 159133 10430
#> - goal6                          1       186 159169 10431
#> - pf_law                         1       233 159216 10432
#> - goal17                         1       240 159223 10432
#> - ef_government                  1       304 159287 10433
#> - ef_money                       1       319 159302 10433
#> - population                     1       397 159380 10434
#> - pf_movement                    1       401 159384 10434
#> - goal5                          1       569 159552 10437
#> - pf_religion                    1       588 159571 10437
#> - ef_trade                       1       728 159712 10440
#> - goal8                          1      1439 160423 10451
#> - goal15                         1      1551 160534 10452
#> - goal7                          1      1587 160570 10453
#> - GDPpercapita                   1      1684 160667 10454
#> - pf_expression                  1      1799 160782 10456
#> - ef_regulation                  1      1931 160914 10458
#> - MilitaryExpenditurePercentGDP  1      6355 165338 10526
#> - goal12                         1     92144 251128 11571
#> 
#> Step:  AIC=10428
#> goal13 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal12 + goal15 + goal17 + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal2                          1        30 159028 10427
#> - internet_usage                 1        53 159051 10427
#> - goal1                          1       126 159124 10428
#> <none>                                       158998 10428
#> - pf_identity                    1       143 159141 10429
#> - pf_security                    1       144 159142 10429
#> - goal10                         1       169 159167 10429
#> - goal6                          1       193 159190 10429
#> - pf_law                         1       244 159242 10430
#> - goal17                         1       251 159249 10430
#> - ef_money                       1       329 159326 10432
#> - ef_government                  1       342 159339 10432
#> - pf_movement                    1       391 159389 10432
#> - population                     1       466 159464 10434
#> - goal5                          1       557 159555 10435
#> - pf_religion                    1       578 159576 10435
#> - ef_trade                       1       730 159728 10438
#> - goal8                          1      1426 160424 10449
#> - goal15                         1      1552 160550 10451
#> - GDPpercapita                   1      1673 160671 10453
#> - pf_expression                  1      1799 160797 10454
#> - goal7                          1      1863 160861 10455
#> - ef_regulation                  1      1925 160923 10456
#> - MilitaryExpenditurePercentGDP  1      6341 165339 10524
#> - goal12                         1     92371 251369 11571
#> 
#> Step:  AIC=10427
#> goal13 ~ goal1 + goal5 + goal6 + goal7 + goal8 + goal10 + goal12 + 
#>     goal15 + goal17 + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + ef_money + 
#>     ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - internet_usage                 1        57 159085 10426
#> <none>                                       159028 10427
#> - pf_identity                    1       130 159158 10427
#> - goal1                          1       131 159159 10427
#> - goal10                         1       175 159203 10428
#> - pf_security                    1       178 159206 10428
#> - pf_law                         1       244 159272 10429
#> - goal6                          1       248 159276 10429
#> - goal17                         1       268 159295 10429
#> - ef_money                       1       347 159375 10430
#> - ef_government                  1       354 159382 10430
#> - pf_movement                    1       413 159441 10431
#> - population                     1       444 159472 10432
#> - goal5                          1       530 159558 10433
#> - pf_religion                    1       585 159613 10434
#> - ef_trade                       1       750 159778 10437
#> - goal8                          1      1397 160425 10447
#> - goal15                         1      1551 160578 10449
#> - GDPpercapita                   1      1652 160680 10451
#> - pf_expression                  1      1802 160830 10453
#> - goal7                          1      1912 160939 10455
#> - ef_regulation                  1      1985 161013 10456
#> - MilitaryExpenditurePercentGDP  1      6406 165434 10524
#> - goal12                         1     96195 255223 11607
#> 
#> Step:  AIC=10426
#> goal13 ~ goal1 + goal5 + goal6 + goal7 + goal8 + goal10 + goal12 + 
#>     goal15 + goal17 + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_identity                    1       116 159201 10426
#> <none>                                       159085 10426
#> - goal1                          1       148 159233 10426
#> - goal10                         1       167 159251 10426
#> - pf_security                    1       202 159286 10427
#> - pf_law                         1       238 159322 10427
#> - goal17                         1       259 159344 10428
#> - goal6                          1       268 159353 10428
#> - ef_government                  1       368 159452 10429
#> - ef_money                       1       406 159491 10430
#> - population                     1       436 159521 10431
#> - pf_movement                    1       438 159522 10431
#> - goal5                          1       485 159570 10431
#> - pf_religion                    1       614 159698 10433
#> - ef_trade                       1       715 159799 10435
#> - goal8                          1      1346 160431 10445
#> - GDPpercapita                   1      1640 160725 10449
#> - pf_expression                  1      1749 160833 10451
#> - goal15                         1      1768 160853 10451
#> - ef_regulation                  1      1933 161018 10454
#> - goal7                          1      2114 161198 10457
#> - MilitaryExpenditurePercentGDP  1      6411 165496 10522
#> - goal12                         1     96825 255909 11612
#> 
#> Step:  AIC=10426
#> goal13 ~ goal1 + goal5 + goal6 + goal7 + goal8 + goal10 + goal12 + 
#>     goal15 + goal17 + GDPpercapita + MilitaryExpenditurePercentGDP + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       159201 10426
#> - goal1                          1       139 159340 10426
#> - goal10                         1       168 159368 10426
#> - pf_security                    1       180 159381 10426
#> - goal17                         1       227 159427 10427
#> - pf_law                         1       248 159448 10427
#> - ef_government                  1       346 159546 10429
#> - population                     1       365 159565 10429
#> - ef_money                       1       372 159573 10429
#> - pf_movement                    1       389 159590 10430
#> - goal6                          1       396 159596 10430
#> - goal5                          1       420 159620 10430
#> - pf_religion                    1       728 159928 10435
#> - ef_trade                       1       791 159992 10436
#> - goal8                          1      1283 160483 10444
#> - pf_expression                  1      1643 160843 10449
#> - GDPpercapita                   1      1761 160962 10451
#> - goal15                         1      1876 161077 10453
#> - ef_regulation                  1      2106 161306 10456
#> - goal7                          1      2159 161360 10457
#> - MilitaryExpenditurePercentGDP  1      6659 165860 10526
#> - goal12                         1     96984 256185 11612
vif(selmod) 
#>                         goal1                         goal5 
#>                          7.11                          2.72 
#>                         goal6                         goal7 
#>                          6.03                          6.08 
#>                         goal8                        goal10 
#>                          2.99                          2.24 
#>                        goal12                        goal15 
#>                         10.23                          1.33 
#>                        goal17                  GDPpercapita 
#>                          2.01                          4.56 
#> MilitaryExpenditurePercentGDP                        pf_law 
#>                          1.42                          5.98 
#>                   pf_security                   pf_movement 
#>                          2.01                          3.63 
#>                   pf_religion                 pf_expression 
#>                          4.30                          5.16 
#>                 ef_government                      ef_money 
#>                          1.81                          2.51 
#>                      ef_trade                 ef_regulation 
#>                          4.04                          2.24 
#>                    population 
#>                          1.39
reg_goal13_all_new <- lm(goal13 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal17 + unemployment.rate + 
                           GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
                           pf_law + pf_religion + ef_government + ef_regulation, data = data_question1)

#reg15
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal15_all_new, scope=list(lower=nullmod, upper=reg_goal15_all_new), direction="backward") 
#> Start:  AIC=11847
#> goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_regulation                  1         0 280070 11845
#> - ef_trade                       1         6 280075 11845
#> - ef_money                       1        24 280093 11845
#> - goal2                          1        33 280102 11845
#> - MilitaryExpenditurePercentGDP  1        68 280137 11846
#> - pf_religion                    1        85 280155 11846
#> - goal6                          1       127 280196 11846
#> <none>                                       280070 11847
#> - pf_movement                    1       423 280493 11849
#> - goal17                         1       648 280717 11851
#> - goal7                          1       747 280817 11852
#> - pf_law                         1       883 280953 11853
#> - pf_expression                  1       981 281051 11854
#> - goal8                          1      1890 281959 11862
#> - goal5                          1      2038 282108 11863
#> - goal13                         1      2677 282746 11869
#> - goal1                          1      3339 283408 11875
#> - pf_security                    1      3559 283628 11877
#> - GDPpercapita                   1      3749 283819 11878
#> - ef_government                  1      4201 284270 11882
#> - pf_identity                    1      4254 284324 11883
#> - goal12                         1      4969 285039 11889
#> - population                     1     10237 290307 11935
#> - goal11                         1     11391 291460 11945
#> - internet_usage                 1     12196 292266 11952
#> - goal10                         1     15006 295075 11976
#> - unemployment.rate              1     15386 295456 11979
#> 
#> Step:  AIC=11845
#> goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_trade                       1         5 280075 11843
#> - ef_money                       1        24 280094 11843
#> - goal2                          1        33 280102 11843
#> - MilitaryExpenditurePercentGDP  1        68 280138 11844
#> - pf_religion                    1        85 280155 11844
#> - goal6                          1       127 280196 11844
#> <none>                                       280070 11845
#> - pf_movement                    1       424 280494 11847
#> - goal17                         1       649 280719 11849
#> - goal7                          1       788 280858 11850
#> - pf_law                         1       953 281023 11852
#> - pf_expression                  1       985 281055 11852
#> - goal8                          1      1902 281972 11860
#> - goal5                          1      2087 282157 11862
#> - goal13                         1      2715 282785 11867
#> - goal1                          1      3351 283421 11873
#> - pf_security                    1      3559 283629 11875
#> - GDPpercapita                   1      3750 283820 11876
#> - pf_identity                    1      4353 284423 11882
#> - ef_government                  1      4468 284538 11883
#> - goal12                         1      5041 285110 11888
#> - population                     1     10286 290356 11933
#> - goal11                         1     11691 291760 11945
#> - internet_usage                 1     12800 292870 11955
#> - goal10                         1     15171 295241 11975
#> - unemployment.rate              1     15451 295521 11977
#> 
#> Step:  AIC=11843
#> goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_money                       1        19 280094 11841
#> - goal2                          1        32 280107 11841
#> - MilitaryExpenditurePercentGDP  1        71 280147 11842
#> - pf_religion                    1        84 280159 11842
#> - goal6                          1       125 280200 11842
#> <none>                                       280075 11843
#> - pf_movement                    1       453 280528 11845
#> - goal17                         1       660 280735 11847
#> - goal7                          1       783 280858 11848
#> - pf_law                         1       948 281023 11850
#> - pf_expression                  1       990 281065 11850
#> - goal8                          1      1937 282012 11858
#> - goal5                          1      2081 282157 11860
#> - goal13                         1      2731 282806 11865
#> - goal1                          1      3347 283422 11871
#> - pf_security                    1      3554 283629 11873
#> - GDPpercapita                   1      3828 283903 11875
#> - pf_identity                    1      4398 284473 11880
#> - ef_government                  1      4502 284577 11881
#> - goal12                         1      5227 285302 11887
#> - population                     1     10394 290469 11932
#> - goal11                         1     11691 291767 11943
#> - internet_usage                 1     12813 292889 11953
#> - goal10                         1     15267 295342 11974
#> - unemployment.rate              1     15514 295589 11976
#> 
#> Step:  AIC=11841
#> goal15 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - goal2                          1        38 280131 11840
#> - MilitaryExpenditurePercentGDP  1        68 280162 11840
#> - pf_religion                    1        91 280185 11840
#> - goal6                          1       120 280214 11840
#> <none>                                       280094 11841
#> - pf_movement                    1       438 280532 11843
#> - goal17                         1       677 280770 11845
#> - goal7                          1       774 280868 11846
#> - pf_law                         1       955 281049 11848
#> - pf_expression                  1       977 281071 11848
#> - goal8                          1      1931 282025 11857
#> - goal5                          1      2073 282167 11858
#> - goal13                         1      2713 282806 11863
#> - goal1                          1      3329 283423 11869
#> - pf_security                    1      3542 283636 11871
#> - GDPpercapita                   1      3813 283906 11873
#> - pf_identity                    1      4435 284529 11879
#> - ef_government                  1      4887 284981 11883
#> - goal12                         1      5227 285321 11886
#> - population                     1     10434 290528 11931
#> - goal11                         1     11930 292024 11944
#> - internet_usage                 1     13028 293122 11953
#> - goal10                         1     15253 295347 11972
#> - unemployment.rate              1     15506 295600 11974
#> 
#> Step:  AIC=11840
#> goal15 ~ goal1 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - MilitaryExpenditurePercentGDP  1        72 280204 11838
#> - pf_religion                    1        90 280221 11839
#> - goal6                          1        94 280226 11839
#> <none>                                       280131 11840
#> - pf_movement                    1       462 280593 11842
#> - goal17                         1       654 280786 11844
#> - goal7                          1       813 280945 11845
#> - pf_law                         1       946 281077 11846
#> - pf_expression                  1       972 281103 11846
#> - goal8                          1      1896 282028 11855
#> - goal5                          1      2036 282167 11856
#> - goal13                         1      2695 282826 11862
#> - goal1                          1      3344 283476 11867
#> - pf_security                    1      3841 283972 11872
#> - GDPpercapita                   1      3876 284007 11872
#> - pf_identity                    1      4584 284716 11878
#> - ef_government                  1      4860 284992 11881
#> - goal12                         1      5395 285527 11885
#> - population                     1     10745 290876 11932
#> - goal11                         1     11940 292071 11942
#> - internet_usage                 1     12994 293126 11951
#> - goal10                         1     15360 295491 11971
#> - unemployment.rate              1     15476 295607 11972
#> 
#> Step:  AIC=11838
#> goal15 ~ goal1 + goal5 + goal6 + goal7 + goal8 + goal10 + goal11 + 
#>     goal12 + goal13 + goal17 + unemployment.rate + GDPpercapita + 
#>     internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
#>     pf_expression + pf_identity + ef_government + population
#> 
#>                     Df Sum of Sq    RSS   AIC
#> - goal6              1        82 280286 11837
#> - pf_religion        1        96 280300 11837
#> <none>                           280204 11838
#> - pf_movement        1       414 280618 11840
#> - goal17             1       598 280802 11842
#> - goal7              1       806 281010 11844
#> - pf_law             1       884 281087 11844
#> - pf_expression      1       929 281133 11845
#> - goal8              1      1895 282099 11853
#> - goal5              1      1963 282167 11854
#> - goal13             1      2624 282828 11860
#> - goal1              1      3276 283480 11865
#> - GDPpercapita       1      3968 284172 11871
#> - pf_security        1      4039 284243 11872
#> - pf_identity        1      4517 284721 11876
#> - ef_government      1      5065 285269 11881
#> - goal12             1      5328 285532 11883
#> - population         1     10681 290885 11930
#> - goal11             1     11873 292077 11940
#> - internet_usage     1     12983 293187 11950
#> - goal10             1     15325 295529 11969
#> - unemployment.rate  1     15543 295747 11971
#> 
#> Step:  AIC=11837
#> goal15 ~ goal1 + goal5 + goal7 + goal8 + goal10 + goal11 + goal12 + 
#>     goal13 + goal17 + unemployment.rate + GDPpercapita + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + population
#> 
#>                     Df Sum of Sq    RSS   AIC
#> - pf_religion        1       109 280395 11836
#> <none>                           280286 11837
#> - pf_movement        1       398 280684 11839
#> - goal17             1       614 280900 11841
#> - goal7              1       728 281014 11842
#> - pf_law             1       925 281211 11843
#> - pf_expression      1       984 281270 11844
#> - goal8              1      2068 282354 11853
#> - goal5              1      2076 282362 11854
#> - goal13             1      2693 282979 11859
#> - goal1              1      3202 283488 11863
#> - GDPpercapita       1      3913 284199 11870
#> - pf_security        1      4078 284364 11871
#> - ef_government      1      5165 285451 11881
#> - pf_identity        1      5199 285485 11881
#> - goal12             1      5486 285772 11884
#> - population         1     10798 291084 11930
#> - goal11             1     11797 292083 11938
#> - internet_usage     1     13350 293636 11951
#> - goal10             1     15402 295688 11969
#> - unemployment.rate  1     15709 295995 11971
#> 
#> Step:  AIC=11836
#> goal15 ~ goal1 + goal5 + goal7 + goal8 + goal10 + goal11 + goal12 + 
#>     goal13 + goal17 + unemployment.rate + GDPpercapita + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_government + population
#> 
#>                     Df Sum of Sq    RSS   AIC
#> <none>                           280395 11836
#> - pf_movement        1       595 280990 11839
#> - goal17             1       646 281041 11840
#> - goal7              1       731 281126 11841
#> - pf_law             1       857 281252 11842
#> - pf_expression      1      1705 282099 11849
#> - goal5              1      2044 282439 11852
#> - goal8              1      2161 282556 11853
#> - goal13             1      2762 283157 11859
#> - goal1              1      3455 283850 11865
#> - GDPpercapita       1      3883 284277 11868
#> - pf_security        1      4353 284748 11873
#> - ef_government      1      5057 285451 11879
#> - goal12             1      5464 285859 11882
#> - pf_identity        1      5777 286172 11885
#> - goal11             1     12063 292457 11939
#> - population         1     13563 293958 11952
#> - internet_usage     1     13582 293977 11952
#> - goal10             1     15559 295954 11969
#> - unemployment.rate  1     15826 296221 11971
vif(selmod) 
#>             goal1             goal5             goal7 
#>              7.12              2.90              6.16 
#>             goal8            goal10            goal11 
#>              3.92              2.12              5.85 
#>            goal12            goal13            goal17 
#>             15.43              5.69              2.02 
#> unemployment.rate      GDPpercapita    internet_usage 
#>              2.02              5.17              3.60 
#>            pf_law       pf_security       pf_movement 
#>              5.57              1.96              3.13 
#>     pf_expression       pf_identity     ef_government 
#>              4.16              2.19              1.65 
#>        population 
#>              1.27
reg_goal15_all_new <- lm(goal15 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal17 + unemployment.rate + 
                           GDPpercapita + internet_usage + pf_law + pf_security + pf_religion + 
                           pf_expression + pf_identity + ef_government + population, data = data_question1)

#reg16
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal16_all_new, scope=list(lower=nullmod, upper=reg_goal16_all_new), direction="backward") 
#> Start:  AIC=8709
#> goal16 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> - goal8                          1        29 79815 8708
#> <none>                                       79786 8709
#> - ef_government                  1        77 79863 8710
#> - pf_identity                    1       115 79901 8711
#> - ef_trade                       1       115 79901 8711
#> - goal5                          1       128 79914 8711
#> - internet_usage                 1       143 79928 8712
#> - goal15                         1       167 79953 8712
#> - goal7                          1       170 79956 8712
#> - ef_regulation                  1       188 79973 8713
#> - goal1                          1       209 79995 8714
#> - GDPpercapita                   1       245 80031 8715
#> - goal2                          1       406 80192 8720
#> - goal13                         1       449 80235 8721
#> - MilitaryExpenditurePercentGDP  1       479 80265 8722
#> - ef_money                       1       530 80315 8724
#> - goal12                         1       535 80320 8724
#> - population                     1      1008 80793 8739
#> - pf_movement                    1      1220 81006 8745
#> - goal6                          1      1295 81081 8747
#> - pf_religion                    1      1723 81509 8761
#> - unemployment.rate              1      2129 81914 8773
#> - pf_expression                  1      3400 83186 8811
#> - goal10                         1      3946 83732 8828
#> - goal11                         1      5539 85325 8875
#> - pf_security                    1      6461 86247 8902
#> - pf_law                         1      8535 88320 8961
#> 
#> Step:  AIC=8708
#> goal16 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal10 + goal11 + 
#>     goal12 + goal13 + goal15 + unemployment.rate + GDPpercapita + 
#>     MilitaryExpenditurePercentGDP + internet_usage + pf_law + 
#>     pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq   RSS  AIC
#> <none>                                       79815 8708
#> - ef_government                  1        85 79900 8709
#> - ef_trade                       1       105 79919 8709
#> - goal5                          1       127 79941 8710
#> - pf_identity                    1       130 79944 8710
#> - internet_usage                 1       130 79945 8710
#> - ef_regulation                  1       179 79993 8712
#> - goal15                         1       180 79995 8712
#> - goal7                          1       185 80000 8712
#> - goal1                          1       193 80007 8712
#> - GDPpercapita                   1       247 80062 8714
#> - goal13                         1       433 80248 8720
#> - goal2                          1       455 80270 8720
#> - MilitaryExpenditurePercentGDP  1       477 80291 8721
#> - goal12                         1       518 80333 8722
#> - ef_money                       1       523 80338 8722
#> - population                     1       980 80795 8737
#> - pf_movement                    1      1247 81062 8745
#> - goal6                          1      1349 81164 8748
#> - pf_religion                    1      1701 81515 8759
#> - unemployment.rate              1      2510 82325 8783
#> - pf_expression                  1      3687 83502 8819
#> - goal10                         1      3993 83808 8828
#> - goal11                         1      5743 85557 8880
#> - pf_security                    1      6636 86451 8906
#> - pf_law                         1      8667 88482 8964
vif(selmod) 
#>                         goal1                         goal2 
#>                          7.27                          2.11 
#>                         goal5                         goal6 
#>                          3.02                          6.92 
#>                         goal7                        goal10 
#>                          7.02                          2.38 
#>                        goal11                        goal12 
#>                          6.25                         16.14 
#>                        goal13                        goal15 
#>                          6.06                          1.55 
#>             unemployment.rate                  GDPpercapita 
#>                          1.56                          5.18 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.46                          4.09 
#>                        pf_law                   pf_security 
#>                          6.30                          2.11 
#>                   pf_movement                   pf_religion 
#>                          3.71                          4.45 
#>                 pf_expression                   pf_identity 
#>                          5.14                          2.57 
#>                 ef_government                      ef_money 
#>                          1.86                          2.61 
#>                      ef_trade                 ef_regulation 
#>                          4.10                          2.50 
#>                    population 
#>                          1.48
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + 
                           unemployment.rate + GDPpercapita + MilitaryExpenditurePercentGDP + 
                           internet_usage + pf_law + pf_security + pf_movement + pf_religion + 
                           pf_expression + pf_identity + ef_government + ef_money + 
                           ef_regulation + population, data = data_question1)
selmod <- step(reg_goal16_all_new, scope=list(lower=nullmod, upper=reg_goal16_all_new), direction="backward") 
#> Start:  AIC=8945
#> goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> - internet_usage                 1        44  88150 8944
#> <none>                                        88106 8945
#> - goal13                         1       146  88252 8947
#> - goal5                          1       246  88352 8950
#> - goal8                          1       289  88395 8951
#> - MilitaryExpenditurePercentGDP  1       346  88452 8953
#> - ef_regulation                  1       372  88478 8954
#> - ef_government                  1       633  88739 8961
#> - ef_money                       1      1032  89138 8972
#> - GDPpercapita                   1      1045  89151 8973
#> - goal2                          1      1068  89174 8973
#> - pf_movement                    1      1182  89288 8976
#> - pf_identity                    1      1559  89665 8987
#> - goal1                          1      1934  90040 8997
#> - pf_religion                    1      1970  90077 8998
#> - unemployment.rate              1      2497  90604 9013
#> - goal10                         1      3354  91460 9036
#> - population                     1      3463  91569 9039
#> - pf_expression                  1      4317  92423 9063
#> - pf_security                    1      5345  93451 9090
#> - pf_law                         1     12940 101046 9286
#> 
#> Step:  AIC=8944
#> goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + pf_law + pf_security + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                        88150 8944
#> - goal13                         1       134  88284 8946
#> - goal5                          1       208  88359 8948
#> - goal8                          1       266  88416 8950
#> - ef_regulation                  1       338  88489 8952
#> - MilitaryExpenditurePercentGDP  1       345  88495 8952
#> - ef_government                  1       615  88766 8960
#> - ef_money                       1       993  89143 8970
#> - GDPpercapita                   1      1035  89186 8971
#> - goal2                          1      1040  89191 8972
#> - pf_movement                    1      1156  89306 8975
#> - pf_identity                    1      1579  89730 8987
#> - goal1                          1      1911  90061 8996
#> - pf_religion                    1      2026  90177 8999
#> - unemployment.rate              1      2460  90610 9011
#> - goal10                         1      3309  91460 9034
#> - population                     1      3454  91605 9038
#> - pf_expression                  1      4518  92668 9067
#> - pf_security                    1      5305  93455 9088
#> - pf_law                         1     13031 101182 9287
vif(selmod)#pf_law
#>                         goal1                         goal2 
#>                          3.21                          1.95 
#>                         goal5                         goal8 
#>                          2.49                          3.99 
#>                        goal10                        goal13 
#>                          2.10                          3.71 
#>             unemployment.rate                  GDPpercapita 
#>                          1.93                          3.77 
#> MilitaryExpenditurePercentGDP                        pf_law 
#>                          1.41                          5.54 
#>                   pf_security                   pf_movement 
#>                          2.08                          3.57 
#>                   pf_religion                 pf_expression 
#>                          4.40                          5.03 
#>                   pf_identity                 ef_government 
#>                          2.22                          1.72 
#>                      ef_money                 ef_regulation 
#>                          1.96                          2.13 
#>                    population 
#>                          1.35
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
  GDPpercapita + MilitaryExpenditurePercentGDP + pf_security + 
  pf_movement + pf_religion + pf_expression + pf_identity + 
  ef_government + ef_money + ef_regulation + population, data = data_question1)
selmod <- step(reg_goal16_all_new, scope=list(lower=nullmod, upper=reg_goal16_all_new), direction="backward") 
#> Start:  AIC=9287
#> goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + pf_security + 
#>     pf_movement + pf_religion + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS  AIC
#> <none>                                       101182 9287
#> - goal13                         1        97 101279 9287
#> - goal5                          1       396 101577 9295
#> - goal8                          1       587 101769 9299
#> - goal2                          1       890 102072 9307
#> - pf_religion                    1       931 102113 9308
#> - pf_movement                    1       948 102130 9308
#> - ef_money                       1      1059 102241 9311
#> - MilitaryExpenditurePercentGDP  1      1224 102406 9315
#> - pf_identity                    1      2140 103321 9337
#> - ef_government                  1      2209 103391 9339
#> - goal1                          1      2349 103531 9342
#> - ef_regulation                  1      2407 103588 9344
#> - population                     1      3720 104902 9375
#> - GDPpercapita                   1      3968 105149 9381
#> - goal10                         1      4436 105617 9392
#> - unemployment.rate              1      5687 106869 9422
#> - pf_expression                  1      8505 109687 9487
#> - pf_security                    1     10506 111688 9532
vif(selmod) #pf_law
#>                         goal1                         goal2 
#>                          3.21                          1.95 
#>                         goal5                         goal8 
#>                          2.48                          3.97 
#>                        goal10                        goal13 
#>                          2.08                          3.58 
#>             unemployment.rate                  GDPpercapita 
#>                          1.84                          3.53 
#> MilitaryExpenditurePercentGDP                   pf_security 
#>                          1.38                          1.96 
#>                   pf_movement                   pf_religion 
#>                          3.57                          4.32 
#>                 pf_expression                   pf_identity 
#>                          4.82                          2.21 
#>                 ef_government                      ef_money 
#>                          1.66                          1.96 
#>                 ef_regulation                    population 
#>                          2.00                          1.35
reg_goal16_all_new <- lm(goal16 ~ goal1 + goal2 + goal5 + goal8 + goal10 + goal13 + unemployment.rate + 
                           GDPpercapita + MilitaryExpenditurePercentGDP + pf_security + 
                           pf_movement + pf_religion + pf_expression + pf_identity + 
                           ef_government + ef_money + ef_regulation + population, data = data_question1)

#reg17
nullmod <- lm(goal1 ~ 1, data = data_question1)
selmod <- step(reg_goal17_all_new, scope=list(lower=nullmod, upper=reg_goal17_all_new), direction="backward") 
#> Start:  AIC=10590
#> goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_religion + pf_expression + 
#>     pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
#>     population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - pf_religion                    1        81 169272 10589
#> - goal8                          1       106 169296 10589
#> - ef_regulation                  1       112 169303 10589
#> <none>                                       169190 10590
#> - goal6                          1       176 169366 10590
#> - internet_usage                 1       406 169596 10594
#> - goal13                         1       529 169719 10595
#> - goal15                         1       619 169809 10597
#> - ef_trade                       1       716 169906 10598
#> - ef_money                       1       754 169945 10599
#> - pf_identity                    1      1464 170655 10609
#> - goal7                          1      1494 170684 10610
#> - goal2                          1      1886 171077 10615
#> - goal10                         1      2136 171327 10619
#> - pf_expression                  1      2355 171546 10622
#> - pf_security                    1      2509 171699 10624
#> - pf_movement                    1      3305 172495 10636
#> - goal11                         1      3576 172767 10640
#> - pf_law                         1      3669 172859 10641
#> - unemployment.rate              1      4043 173233 10647
#> - MilitaryExpenditurePercentGDP  1      4542 173732 10654
#> - population                     1      6381 175571 10680
#> - GDPpercapita                   1      6447 175637 10681
#> - ef_government                  1      9821 179011 10729
#> - goal16                         1      9887 179077 10730
#> - goal12                         1     10095 179285 10732
#> - goal1                          1     12090 181281 10760
#> - goal5                          1     12314 181504 10763
#> 
#> Step:  AIC=10589
#> goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + ef_regulation + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> - ef_regulation                  1       112 169383 10588
#> - goal8                          1       120 169392 10589
#> <none>                                       169272 10589
#> - goal6                          1       198 169469 10590
#> - internet_usage                 1       386 169658 10593
#> - goal13                         1       562 169834 10595
#> - goal15                         1       631 169903 10596
#> - ef_trade                       1       702 169974 10597
#> - ef_money                       1       779 170051 10598
#> - pf_identity                    1      1388 170660 10607
#> - goal7                          1      1519 170791 10609
#> - goal2                          1      1921 171192 10615
#> - goal10                         1      2057 171329 10617
#> - pf_security                    1      2453 171724 10623
#> - pf_expression                  1      2541 171813 10624
#> - pf_movement                    1      3289 172561 10635
#> - goal11                         1      3551 172822 10639
#> - pf_law                         1      3976 173248 10645
#> - unemployment.rate              1      4002 173274 10645
#> - MilitaryExpenditurePercentGDP  1      4494 173766 10652
#> - GDPpercapita                   1      6427 175699 10680
#> - population                     1      6774 176045 10685
#> - goal12                         1     10137 179409 10732
#> - goal16                         1     10368 179640 10735
#> - ef_government                  1     10494 179765 10737
#> - goal5                          1     12472 181744 10764
#> - goal1                          1     12909 182181 10770
#> 
#> Step:  AIC=10588
#> goal17 ~ goal1 + goal2 + goal5 + goal6 + goal7 + goal8 + goal10 + 
#>     goal11 + goal12 + goal13 + goal15 + goal16 + unemployment.rate + 
#>     GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + 
#>     pf_law + pf_security + pf_movement + pf_expression + pf_identity + 
#>     ef_government + ef_money + ef_trade + population
#> 
#>                                 Df Sum of Sq    RSS   AIC
#> <none>                                       169383 10588
#> - goal8                          1       138 169521 10588
#> - goal6                          1       191 169575 10589
#> - internet_usage                 1       509 169892 10594
#> - ef_trade                       1       607 169991 10595
#> - goal13                         1       629 170012 10596
#> - goal15                         1       634 170017 10596
#> - ef_money                       1       811 170194 10598
#> - pf_identity                    1      1305 170688 10606
#> - goal7                          1      1409 170792 10607
#> - goal10                         1      1986 171370 10616
#> - goal2                          1      2032 171415 10616
#> - pf_security                    1      2466 171849 10623
#> - pf_expression                  1      2488 171871 10623
#> - pf_movement                    1      3445 172829 10637
#> - goal11                         1      3446 172830 10637
#> - pf_law                         1      3880 173263 10643
#> - unemployment.rate              1      4111 173495 10646
#> - MilitaryExpenditurePercentGDP  1      4429 173812 10651
#> - GDPpercapita                   1      6392 175775 10679
#> - population                     1      6690 176073 10683
#> - goal16                         1     10498 179882 10737
#> - goal12                         1     10567 179951 10738
#> - ef_government                  1     10585 179969 10738
#> - goal1                          1     13121 182504 10773
#> - goal5                          1     13337 182721 10776
vif(selmod) 
#>                         goal1                         goal2 
#>                          7.13                          2.15 
#>                         goal5                         goal6 
#>                          2.92                          7.05 
#>                         goal7                         goal8 
#>                          6.68                          4.21 
#>                        goal10                        goal11 
#>                          2.42                          6.68 
#>                        goal12                        goal13 
#>                         16.10                          6.02 
#>                        goal15                        goal16 
#>                          1.56                          6.65 
#>             unemployment.rate                  GDPpercapita 
#>                          2.12                          5.19 
#> MilitaryExpenditurePercentGDP                internet_usage 
#>                          1.46                          3.94 
#>                        pf_law                   pf_security 
#>                          6.46                          2.30 
#>                   pf_movement                 pf_expression 
#>                          3.43                          4.29 
#>                   pf_identity                 ef_government 
#>                          2.47                          1.71 
#>                      ef_money                      ef_trade 
#>                          2.62                          3.90 
#>                    population 
#>                          1.32
reg_goal17_all_new <- lm(goal17 ~ goal1 + goal2 + goal5 + goal10 + goal13 + goal15 + unemployment.rate + 
                           GDPpercapita + MilitaryExpenditurePercentGDP + internet_usage + pf_security + pf_movement + pf_religion + pf_expression +
                           pf_identity + ef_government + ef_money + ef_trade + ef_regulation + 
                           population, data=data_question1)

::: stargazer regressions ::: {.cell layout-align=“center”}

Impact of variables over SDG goals 1,2
Dependent variable:
goal1 goal2
(1) (2)
goal5 -0.185*** 0.116***
(0.028) (0.013)
goal6 1.110*** 0.305***
(0.037) (0.019)
goal8 0.447*** 0.255***
(0.056) (0.028)
goal10 0.173*** -0.012*
(0.014) (0.007)
goal13 -0.259*** 0.084***
(0.027) (0.011)
goal15 -0.273*** -0.028**
(0.025) (0.012)
goal17 0.363*** -0.040***
(0.031) (0.015)
unemployment.rate 85.900*** 9.060***
(6.510) (3.140)
GDPpercapita -0.0003***
(0.00003)
MilitaryExpenditurePercentGDP 1.940*** -0.198
(0.286) (0.136)
internet_usage 17.800*** 3.030***
(1.780) (0.807)
pf_law -0.393**
(0.180)
pf_security 1.160***
(0.113)
pf_movement 1.460*** -0.619***
(0.316) (0.132)
pf_religion -4.090***
(0.266)
ef_government 4.450***
(0.304)
pf_identity -0.527***
(0.085)
ef_money -1.220*** 0.524***
(0.307) (0.146)
ef_trade 3.570*** 0.453**
(0.400) (0.192)
ef_regulation -1.040** -0.979***
(0.405) (0.194)
population 0.000***
(0.000)
Constant -41.600*** 11.700***
(5.810) (2.570)
Observations 2,499 2,499
R2 0.805 0.530
Adjusted R2 0.804 0.526
Residual Std. Error 14.200 (df = 2481) 6.760 (df = 2480)
F Statistic 602.000*** (df = 17; 2481) 155.000*** (df = 18; 2480)
Note: p<0.1; p<0.05; p<0.01

:::

Code
##### geom point #####

#print values with correlation > 0.8 and make plots

# Filtering values where the absolute value is greater than 0.8
highcorrelations <- melted_corr_matrix_GVar %>% filter(value > 0.8)

ggplot(data_question1, aes(internet_usage, overallscore)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(GDPpercapita, goal9)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(internet_usage,goal9)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(ef_legal,goal9)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(pf_law, goal16)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

ggplot(data_question1, aes(ef_legal, goal16)) +
  geom_point()+ geom_smooth(se = FALSE) +
  labs(title = "Scarplot overallscore and internet usage")

Let’s explore how the different SDG are correlated together by creating a heatmap of the correlation between our variables. We added a script to check whether the correlations are significantly different from 0. First, let’s select the SDGs scores.

Code
sdg_scores <- Q4[, c('goal1', 'goal2', 'goal3', 'goal4', 'goal5', 'goal6',
                     'goal7', 'goal8', 'goal9', 'goal10', 'goal11', 'goal12',
                     'goal13', 'goal15', 'goal16', 'goal17')]

We then, initialize the matrices and calculate the correlation, and p-values of each combination of SDGs scores

Code
cor_matrix <- matrix(nrow = ncol(sdg_scores), ncol = ncol(sdg_scores))
p_matrix <- matrix(nrow = ncol(sdg_scores), ncol = ncol(sdg_scores))
rownames(cor_matrix) <- colnames(sdg_scores)
rownames(p_matrix) <- colnames(sdg_scores)
colnames(cor_matrix) <- colnames(sdg_scores)
colnames(p_matrix) <- colnames(sdg_scores)

# Calculate correlation and p-values
for (i in 1:ncol(sdg_scores)) {
  for (j in 1:ncol(sdg_scores)) {
    test_result <- cor.test(sdg_scores[, i], sdg_scores[, j])
    cor_matrix[i, j] <- test_result$estimate
    p_matrix[i, j] <- test_result$p.value}}

We then reshape our data to be able to use the ggplot2 package to create our heatmap.

Code
melted_cor_matrix <-
  melt(cor_matrix)
melted_p_matrix <-
  melt(matrix(as.vector(p_matrix), nrow = ncol(sdg_scores)))

plot_data <- # Combine the datasets
  cbind(melted_cor_matrix, p_value = melted_p_matrix$value)

ggplot(plot_data, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  geom_text(aes(label = sprintf("%.2f", value), color = p_value < 0.05),
            vjust = 1) +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1,1), space = "Lab", 
                       name="Pearson\nCorrelation") +
  scale_color_manual(values = c("black", "yellow")) + # black when significant, yellow if not
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.text.y = element_text(angle = 45, hjust = 1),
        legend.position = "none") +
  labs(x = 'SDG Goals', y = 'SDG Goals',
       title = 'Correlation Matrix with Significance Indicator')

Note that as said previously, we assessed the correlations to ascertain if they substantially deviated from zero, setting the significance level at an alpha of 5%. To aid in visualization, we marked any correlations that did not meet this level of significance with a yellow highlight in our graphical representation. The absence of yellow markings on our plot suggests that all Sustainable Development Goal (SDG) scores demonstrate a statistically significant correlation.

We can have a look at the shape of the corelation between the SDGs with the plot function.

Code
plot(sdg_scores)

5.3 Different methods considered

5.4 Competing approaches

5.5 Justifications

6 Conclusion

  • Take home message
  • Limitations
  • Future work?